-
Notifications
You must be signed in to change notification settings - Fork 2
An implementation of Bespoke, a semi-supervised community detection algorithm
License
abaxi/bespoke-icdm18
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Author email: [email protected] ############################################# #####Dependencies ############################################# python 2.7 numpy 1.11.0 scipy 0.17.0 scikit learn 0.16.0 pylab 1.5.1 ############################################# #####Cite ############################################# Bakshi, Arjun, Srinivasan Parthasarathy, and Kannan Srinivasan. "Semi-Supervised Community Detection Using Structure and Size." 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 2018. ############################################# #####Input file and formats ############################################# Community file: 1 community per line. Line should contain TAB (\t) separated list of node IDs in that community. Node IDs need to be integers. Example: ------------------ 1 22 34 102 45 243 43 12 55 324 ... .. . ------------------ Graph file: 1 edge per line. A line should have 2 nodes IDs separated by a TAB(\t). Node IDs need to be integers. Example: ------------------ 1 2 44 4 12 43 ... ... . ------------------ Node labels file: 1 node-label pair per line. Node ID followed by numeric/integer label. If there are N distinct labels, then the labels should range from 0 to N-1. Should be TAB separated. Node IDs and labels need to be integers. Ex: ------------------ 1 0 22 4 13 4 4 4 11 3 14 0 ... .. . ------------------ ############################################# #####Generating node labels file ############################################# If you do not have a node label file, one can be generated using the "label_nodes.py" script. Usage: python label_nodes.py <graph_src> <#node labels> <output filename> [--verbose] Generally, setting #node labels to 4 or 5 works best. ############################################# #####Running Bespoke ############################################# Bespoke can be run using the "run_bespoke.py" python file. usage: python run_bespoke.py nw_src tr_src num_find ls op [--np NP] [--eval_src EVAL_SRC] positional arguments: nw_src Graph file. One edge per line. tr_src File containing training comms. num_find Number communities to extract. ls File containing node labels. op Write discovered communities to this file. optional arguments: --np NP Number of subgraph patterns to extract. Default=5 --eval_src EVAL_SRC File containing comms. to evaluate against. -h, --help show this help message and exit ############################################# #####Example: Running Bespoke ############################################# Download the dblp graph and top5000 communities dataset from Standford's SNAP website. On ubuntu/mac/linux run the following commands: 1. python label_nodes.py com-dblp.ungraph.txt 4 dblp_labels.txt --verbose That will generate a file with node to label mapping called dblp_labels.txt Output: ### Generating node labels ### processing com-dblp.ungraph.txt graph loaded in 3.55 seconds ----------------- jaccard percentiles computed in 42.5 seconds Jaccard node features extracted/written in 0.27 seconds ----------------- node features loaded nodes clustered Node labels written to file: dblp_labels.txt ### 2. tail -n 100 com-dblp.top5000.cmty.txt > train.txt Use last 100 communities as training data 3. head -n -100 com-dblp.top5000.cmty.txt > test.txt Use everything except 100 last 100 as test or evaluation dataset 4.python run_bespoke.py com-dblp.ungraph.txt train.txt 50000 dblp_labels.txt op.txt --eval_src test.txt Run bespoke. Generate 50k community guesses, evaluate them against communities in test.txt Output: #### Input arguments ### eval_src --> test.txt tr_src --> train.txt num_find --> 50000 ls --> dblp_labels.txt np --> 5 nw_src --> com-dblp.ungraph.txt op --> op.txt #### ### Beginning Bespoke ### Training... Training complete... Beginning extraction... Extraction complete. total_time(s): 116.09 training_time(s): 103.69 Writing extracted communities to file: op.txt ... done ### End of Bespoke ### Scoring extracted communities (may take a lot of time)... F1 Score: 0.541 ###### I would also recommend looking at another user's implementation of Bespoke. It is supposed to be faster overall. Link: https://github.com/yzhang1918/bespoke-sscd
About
An implementation of Bespoke, a semi-supervised community detection algorithm
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published