Skip to content

community detection for the whole Twitter graph on a single laptop

Notifications You must be signed in to change notification settings

melifluos/LSH-community-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSH-community-detection

This is the code for our paper 'Real-Time Community Detection in Large Social Networks on a Laptop' https://arxiv.org/abs/1601.03958 Community detection for large networks on a single laptop. We use minhash signatures to encode the Jaccard similarity between neighbourhood graphs of vertices in social networks. A Locality Sensitive Hash table is built on top of the minhashes to perform extremely fast nearest neighbour search. The results of the nearest neighbour search are ranked and structured using the WALKTRAP community detection algorithm.

Prerequisites

The code uses the numpy, pandas and scikit-learn python packages. We recommend installing these through Anaconda. Generating minhashes requires the mmh3 package.

pip install mmh3

We provide binaries of the cython code. If you wish to alter the cython code you will need to install cython

pip install cython

Replicating the experiments with Twitter data

Download the minhash data available from DANS EASY:

https://doi.org/10.17026/dans-x6a-mgvm

Assuming you are in the directory of the source code and have cloned this repository.

To build the LSH table

python LSH.py minhash_data_path LSH_output_path

To generate metrics for the ground truth communities

python assess_community_quality.py minhash_data_path outpath

To run experimentation

python run_experiments.py minhash_data_path LSH_outputpath outpath

Replicating the end-to-end process with the public email data set from SNAP https://snap.stanford.edu/data/email-EuAll.html

python run_email_data.py

This will generate minhashes from the raw data and use them to build an LSH table. From the LSH table all of the results shown in the paper are generated.

The LSH table and the minhashes are written to the resources folder. The plots are written to the results folder.

Authors

Ben Chamberlain

Citation

If you make use of this code please cite:

Chamberlain BP, Levy-Kramer J, Humby C, Deisenroth MP. Real-Time Community Detection in Large Social Networks on a Laptop. arXiv preprint arXiv:1601.03958. 2016 Jan 15.

About

community detection for the whole Twitter graph on a single laptop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages