A fast heuristic algorithm for community detection in large-scale complex networks. This is a reference implementation of the algorithm proposed in http://bdigital.unal.edu.co/69933/. This repository is integral part of the Master's thesis.
Papers:
- Dynamic Structural Similarity on Graphs: https://arxiv.org/abs/1805.01419
- Fast Heuristic Algorithm for Multi-Scale Hierarchical Community Detection: https://dl.acm.org/citation.cfm?doid=3110025.3110125
- High-Quality Disjoint and Overlapping Community Structure in Large-Scale Complex Networks: https://arxiv.org/abs/1805.12238
Please, check the branches for different versions of the algorithm.
- Added support for weighted graphs.
- Weighted Most Weak community definition has been removed since it performs quite similar to the Weighted Weak community definition. So the parameter -C from previous version is no longer required.
- Community definition and size detection are now merged into a single loop, so we guarantee the final result contains Weak communities with minimum size desired.
- Self-loops are removed.
- Zero degree vertices are skipped.
- The input graph is considered undirected.
$ make clean
$ make
Note: The compiler must be compatible with the C++11 standard.
--quiet -q
No verbose.
--input_file -i
Input file with the graph in edge list format: node1 node2 [weight].
--weighted -w
The input graph is weighted.
--output_file -o
Output file with the detected communities in format given by parameter -f.
--output_format -f
Output format of the detected communities: 0 for one pair node-community per line. 1 for one community per line
--gml -g
Graph in GML format with membership (Useful for visual exploratory analysis in Gephi). Only works if the -O option is not specified
--size_distri_file -z
Output file for community size distribution (Community size vs frequency).
--edge_sims_file -s
Output file for dynamic structural similarity for each edge in the graph (node1 node2 similarity).
--memb_dist_file -m
Output for membership distribution (number of memberships vs frequency).
--node_numbering -n
Index of the first node in the graph. Default to 0. Common values {0, 1}.
--min_comm_size -K
Minimum community size in the result. Default value 2.
--dss_iterations -I
Number of iterations to perform by the Dynamic Structural Similarity. Default to 2.
--overlap -O
Detect overlapping communities: 0 for fuzzy communities. 1 for crisp communities with threshold -T.
--crisp_threshold -T
Threshold used to generate the crisp overlapping communities. Default to 0.05. Common values {0.001, 0.005, 0.01, 0.02, 0.03, 0.04, 0.05}
./master -i input_graph.txt -o output_memberships.txt -g output_gml.gml
./master -i input_graph.txt -o output_communities.txt -O 1 -T 0.05