Skip to content

All the code required to reproduce the results in our paper "Scaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit Theorem"

License

Notifications You must be signed in to change notification settings

JoetheManHowie/NUSCAN

Repository files navigation

Introduction


This repository contains all the code and instructions necessary to reproduce the experimental results from our paper Scaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit Theorem.

Overview


There are four directories each with their our purpose:

  1. prep_graphs/ contains all the python code used to format datasets for the clustering algorithms NUSCAN and USCAN.
  2. uscan/ holds the C++ implementation of USCAN as coded by the authors of Qiu et. al., with few additions needed for our analysis.
  3. nuscan/ holds the modified USCAN code that includes the NUSCAN algorithm.
  4. analysis/ has the scripts used to analyze the clusters and probability calculations made by both algorithms.

In each of these directory there are more specific instructions for using the code inside.

Workflow


In general to execute the analysis that was done in our paper the following sets must take place:

  1. Format graph - NUSCAN and USCAN both operate on undirected probabilistic graphs. See prep_graphs/ for more information on the formatting requirements.
  2. Run the graph through both clustering algorithms with the output option to generate the text files with the probabilities $P[e, \varepsilon]$ for each $e$ and another text file with the cluster sets, hubs, and outliers. See uscan/ and nuscan/ for more details.
  3. Analyze results - there are some scripts that compute cluster quality, compare $P[e, \varepsilon]$ between both methods, compare cluster, hub, outlier sets. See analysis/ for more direction.

Notes

Both NUSCAN and USCAN have the option to output two text files one called <graphfile>-eta-eps-mu-thres.cluster_nuscan and <graphfile>-eta-eps-mu-thres.prob_nuscan (for uscan, the thres is not present and the suffix is "_uscan"). The two files are required to run the code in analysis/, as the code assumes the formatting produced by NUSCAN and USCAN. See uscan/ and nuscan/ for more information on the files produced, and see analysis/ for more information on the analyzes preformed on the files.

About

All the code required to reproduce the results in our paper "Scaling Up Structural Clustering to Large Probabilistic Graphs Using Lyapunov Central Limit Theorem"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published