Remote repository of public domain network datasets (along with their ground truth clustering) for the CDlib libray.
For instructions on how to load the data within CDlib refer to the official documentation
Here the list of available network datasets - both real and synthetically generated.
Network Name | Network Type | Upstream |
---|---|---|
Karate Club | Social | UCINET |
Youtube | Social | SNAP |
DBLP | Scientific Collaboration | SNAP |
Amazon | Co-Purchases | SNAP |
LFR Benchmark datasets:
Set of networks with planted community partitions generated using the networkx implementation of the Lancichinetti-Fortunato-Radicchi benchmark.
“Benchmark graphs for testing community detection algorithms”, Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi, Phys. Rev. E 78, 046110 2008
Dataset names follows the pattern
LFR_N{number of nodes}_ad{average degree}_mc{min community size}_mu{mixing coefficient}
where:
- number of nodes: [1000, 5000, 10000, 50000, 100000]
- average degree: [5]
- min community size: [50]
- mixing coefficient: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
The power law exponent for the degree distribution is fixed at 3, while for the community size distribution to 1.5