-
Problem Formulation: We introduce a new problem, personalized graph summarization, and demonstrate the usefulness of personalizing summary graphs.
-
Algorithm Design: We propose PeGaSus (Personalized Graph Summarization with Scalability), a linear-time algorithm for the problem. We show empirically that it scales to a graph with one billion edges.
-
Extensive Experiments: We exhibit the effectiveness of PeGaSus and its applicability to distributed multi-query answering using six real-world graphs.
The main paper is available at Here. If a preview does not appear properly, please download this file.
The online appendix is available at Here. If a preview does not appear properly, please download this file.
The algorithms used in the paper is available at ./PeGaSus/
.
Name | #Nodes | #Edges | Summary | Source | Download |
---|---|---|---|---|---|
LastFM-Asia (LA) | 7,624 | 27,806 | Social | SNAP | LINK |
Caida (CA) | 26,475 | 53,381 | Internet | SNAP | LINK |
DBLP (DB) | 317,080 | 1,049,866 | Collaboration | SNAP | LINK |
Amazon0601 (A6) | 403,364 | 2,443,311 | Co-purchase | SNAP | LINK |
Skitter (SK) | 1,694,616 | 11,094,209 | Internet | SNAP | LINK |
Wikipedia (WK) | 3,174,745 | 103,310,688 | Hyperlinks | KONNECT | LINK |
- >= OpenJDK 12
PeGaSus assumes that the input graph G = (V, E) is undirected without self-loops. The format of the example file is given below. Each line represents an edge. Each edge {u, v} ∈ E joins two distinct nodes u != v ∈ V, separated by a tab. Each node v ∈ V is assigned to a unique integer id.
0 1
2 3
3 4
- The example consists of 5 nodes with 3 edges.
PeGaSus personalizes a summary graph for a target node set T ⊆ V. The format of the example file is given below. Each line contains the unique id of a node v ∈ V.
4721
274
2195
- The example target node set consists of 3 nodes.
java -jar PeGaSus.jar [data path] [target compression ratio] [target node set file path] [checking personalized error] [saving summary graph] [alpha] [beta]
- data path: path to the input text file
- target compression ratio [0,1]: desired size of a summary graph compared relative to the input graph size in bits
- target node set file path: path to the target node set file
- checking personalized error: if true, compute the personalized error. Otherwise, not.
- saving summary graph: if true, save a summary graph. Otherwise, not.
- alpha: degree of personalization
- beta: parameter for adaptive thresholding
java -jar PeGaSus.jar ./data/lastfm_asia.txt 0.5 ./data/lastfm_asia_TNS.txt true true 1.25 0.1
The output file contains information about nodes (nodes in G) belonging to each supernode S of the output summary graph G̅ = (S, P) and information about each superedge P. The first integer on each line following the line "<Nodes in Each Supernode>" represents the id of the supernode, and the following integers separated by tabs represent the ids of the nodes belonging to that supernode. Each line following the line "<Superedge Info>" represents a single superedge. The two integers separated by tabs represent the id of the source supernode and the id of the destination supernode.
- Output
dataPath: ./lastfm_asia.txt
Target Ratio: 0.1
Target Node Set File Name: ./lastfm_asia_TNS.txt
is Checking Personalized Error: true
is Saving Summary Graph: true
Alpha: 1.25
Beta: 0.1
|V|: 7624
|E|: 27806
TopNDrop
Compression Ratio 0.1 Execution time 0.788s
Personalized Error 0.001089218172509844
- Output file
<Nodes in Each Supernode>
2569 3145 2225 3975 2306 498 428 1464
2775 5346 2162 4012 3774
.
.
<Superedge Info>
2569 2569
2775 2775
1481 1357
.
.