GitHub - slackner/anti-community: Anti-community Detection

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
datasets		datasets
src		src
third-party		third-party
.gitignore		.gitignore
Makefile		Makefile
README		README

Repository files navigation

Anti-Community Detection
########################

Installation
============

As a first step, please make sure that all required dependencies are installed.
The following commands can be used to install missing packages on Ubuntu or
Debian operating systems:

$ sudo apt-get install build-essential
$ sudo apt-get install libgsl-dev

Afterwards, just extract the source code to a directory of your choice and run:

$ make

If everything goes well, the program terminates with exitcode 0. If not, the
error message should hopefully contain a hint what is going wrong. All
compiled programs will be located within the src/ and third-party/ directory.


Self-Test
=========

After the compilation finished, it makes sense to run the included self-tests
to check that everything works fine. The tests are all located in the src/
folder, so we first have to switch to the right directory:

$ cd src

To run the self-tests of the library itself, execute:

$ make test

To run tests for the Python bindings, execute:

$ ./pygraph.py


Graph File Format
=================

Most programs contained in this package require graph data with the following
input format:

--- snip ---
NumberOfNodes
StartNode EndNode [Weight]  \
StartNode EndNode [Weight]   } Edges
StartNode EndNode [Weight]  /
[...]
--- snip ---

The first line defines the number of nodes, all following lines describe the
edges (first node index, second node index, optional weight). Lines starting
with "#" are comments and have to be ignored. An example for a trivial graph
can be seen in src/data/example-trivial.graph.


Programs
========

The current version of the package contains the following programs. Each
program also has a separate help, which can be shown by running
"PROGRAM --help", e.g., "src/laprop --help".

$ src/gml
	Convert graph to GML file format.

$ src/laprop
	Apply label propagation algorithm to the graph specified by the given file.

$ src/maxmodularity
	Agglomerative hierarchical clustering algorithm for community detection /
	modularity maximization.

$ src/minmodularity
	Agglomerative hierarchical clustering algorithm for anti- community
	detection / modularity minimization.

$ src/antimodularity
	Agglomerative hierarchical clustering algorithm for anti- community
	detection / anti-modularity maximization.

$ src/components
	Return connected components of an undirected graph.

$ src/metrics
	Compute modularity and anti-modularity for the graph specified by the given
	file. When a ground-truth label file is given, also compute cluster quality
	metrics.

$ src/subgraph
	Extract subgraph specified by the given files.


Datasets
========

The datasets directory contains a collection of various small graphs. The
graphs already have been converted to the correct format. Each directory
contains a file called "result.graph".

After changing the implementation of an algorithm, it is sufficient to execute:

$ cd datasets
$ make clean
$ make

to run all evaluations again. This might take a few minutes. In each directory,
a "metrics.txt" file will be created, which contains common measures (such as
the number of communities, Modularity, Antimodularity, etc.) and can be used to
compare different algorithms.


Usage Example
=============

Community Detection
-------------------

As an example, the following commandline applies a Modularity minimization
algorithm to the Adjectives and Nouns dataset.

$ src/minmodularity datasets/adjectives-noun/result.graph

By default, the output (a vector of vertex labels) is written directly to the
terminal. Nevertheless, if preferred, it can also be redirected to a file:

$ src/minmodularity datasets/adjectives-noun/result.graph > output.txt

Note that some programs might have additional parameters, which are passed
through to the (anti-)community detection method. Please check "PROGRAM --help"
to get a list of available options.

Evaluation
----------

To compute the modularity and anti-modularity measure, as well as some other
statistical values for a community detection result, just run:

$ ./src/metrics --clusters=datasets/adjectives-noun/minmodularity.txt \
$               --groundtruth=datasets/adjectives-noun/ground-truth.txt \
$               datasets/adjectives-noun/result.graph

The output will contain one measure per line. To write the result to a file,
the operator ">" can be used, as in the example above.

Plotting
--------

To create plots suitable for publication, we recommend the tool Gephi available
from https://gephi.org/. As a prepartion, use the "gml" program to create a
file containing all attributes required for plotting, for example:

$ ./src/gml --base=datasets/adjectives-noun/ground-truth.txt \
$           --algo=datasets/adjectives-noun/minmodularity.txt \
$           datasets/adjectives-noun/result.graph > import.gml

Note that you can add an arbitrary number of attributes by adding additional
--NAME=FILENAME arguments to the commandline. When you have selected all
attributes you need, import the output file (here "import.gml") into Gephi.

To color vertices based on an attribute, select the menu "Partition" in the
side bar "Appearance" on the left. Choose one of the attributes (e.g., "algo"),
and then click on "Apply" to confirm your changes.

Arranging the vertices based on attributes requires a bit more effort: Create
a new filter "Library -> Attributes -> Partition -> base" using the
side bar "Filters" on the right. Then select one partition at a time, click
on the "Filter" button in the bottom right corner, and apply a Layout
method of your choice (e.g., "Fruchterman Reingold"). When this is done,
use the "Drag" cursor to move the cluster to a location of your choice,
and proceed with the next partition. Note that you can change the selection
radius by clicking on "Configure" in the top part of the "Graph" view. (If
there is an easier way to achieve a similar result, please let me know!)


References
==========

 - S. Lackner, A. Spitz, M. Weidemüller, and M. Gertz. 2018. Efficient
   Anti-community Detection in Complex Networks. Proceedings of the 30th
   International Conference on Scientific and Statistical Database
   Management (SSDBM '18), July 9-11, 2018, Bozen-Bolzano, Italy. ACM,
   NY, USA. https://doi.org/10.1145/3221269.3221289

 - L. Chen, Q. Yu, and B. Chen. 2014. Anti-modularity and anti-community
   detecting in complex networks. Inf. Sci. 275 (2014), 293–313.
   https://doi.org/10.1016/j.ins.2014.02.040

 - M. E. J. Newman. 2004. Fast algorithm for detecting community
   structure in networks. Phys. Rev. E 69, 6 (2004), 5.
   https://doi.org/10.1103/PhysRevE.69.066133

 - M. E. J. Newman and M. Girvan. 2004. Finding and evaluating community
   structure in networks. Phys. Rev. E 69, 2 (2004), 15.
   https://doi.org/10.1103/PhysRevE.69.026113

 - M. E. J. Newman and G. Reinert. 2016. Estimating the number of
   communities in a network. Phys. Rev. Lett. 117 (2016), 5. Issue 7.
   https://doi.org/10.1103/PhysRevLett.117.078301