A linear time complexity community detection algorithm for large scale graph.
This is a spark implementation of
Zhao, W., Martha, V., & Xu, X. (2013, March). PSCAN: a parallel Structural clustering algorithm for big networks in MapReduce. In Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on (pp. 862-869). IEEE
Which is the parallel version of:
X.Xu, N.Yuruk, Z. Feng, T. Schweiger. SCAN: a structural clustering algorithm for networks,Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 824-833, 2007.
val graph:Graph[Int, Int] = GraphLoader.edgeListFile(sc, "path_to_graph_file")
val components:Graph[VertexId, Int] = PSCAN.pscan(graph, epsilon = 0.2)
println("num communities: " + components.vertices.map{case (vId,cId)=>cId}.distinct.count)
println("nodes of every communities:")
components.vertices.map(v=>(v._2, v._1)).groupByKey().collect
.foreach(x=>println("%d: %s".format(x._1, x._2.mkString(" "))))
This repo is a re-organization of the PSCAN implementation in this repo with some minor change in order to make it more readable and easy to use.