skeleton:
data to operate with:
data | type | functionality |
---|---|---|
min_cn |
edge property array | at beginning, min common neighbors to be similar, then similar(-1) or not similar(-2) indicator. some of them not evaluated by pruning |
is_core |
vertex property array | at begining, bloom filter, after checking core second phase, the result of cores status for all vertices |
is_non_core |
vertex property array | bloom filter, used in check core second phase |
disjoint-set |
union-find set | inside two arrays, parent and rank , union-by-rank, compress-in-find-root, each tree represent a set |
cluster_id |
vertex property array | for looking up min core vertex id in this cluster represented by root. in other words, cluster_id[FindRoot(u)] is this cluster id |
all phases are executed in parallel where disjoint-sets adopts lock-free design and other parts do not incur data race.
intentions in each phase:
-
1 Prune(parallel): initialize some values in
min_cn
, which can be easily computated with arithmetic operations instead of set-intersection computations. There are two types, first is definitely not similar two vertices, second is defintely similar two vertices. -
2.1 CheckCoreFirstPhase(parallel): evaluate similarities for relationships from source vertex
u
to destination vertexv
for allu<v
and update corresponding similarity from source vertexv
to destination vertexu
. Min-max pruning is applied in this phase. Mark definite cores and definite non-cores withis_core
andis_non_core
array. -
2.2 CheckCoreSecondPhase(parallel): since the existence of min-max pruning, some vertices can not be determined by two bloom filters,
is_core
andis_non_core
, thus similarities sourcing from these vertices that are not evaluated are evaluated and relationship is established for reversed edge. -
3.1 ClusterCoreFirstPhase(parallel): without further evaluation of similarities, union similar core-cores.
-
3.2 ClusterCoreSecondPhase(parallel): some similarity predicates of core-cores are not know, evaluate them only when they are not in the same tree in the disjoint-set yet, if they are similar, union them.
-
Trivial Preparing: Line 11 to 14 is for initializing
cluster_id
for looking upcluster_id[FindRoot(u)]
. -
4.1 ClusterNonCoreFirstPhase(parallel): evaluate some unknow similarities among cores and non-co res.
-
4.2 ClusterNonCoreSecondPhase(parallel): cluster non-cores when they are similar to cores
dataset: orkut, eps:0.2
, min_pts:5
previous:
improved:
reduced number of border checking and element comparison branch statement
type | prev clockticks | improved clockticks |
---|---|---|
border checking | 150 billion | 45 billion |
element cmp branch statement | 320 billion | 230 billion |
min-max value, offset increment | 150 billion | 140 billion |