Skip to content
/ SSL_lib Public

Methods for semi-supervised learning on graphs

License

Notifications You must be signed in to change notification settings

DimBer/SSL_lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OVERVIEW

Programm SSL implements and runs tests for different semi-supervised learing methods on multiclass or multilabel graphs with available groundtruth labels.

Two modes available:

  • test: Takes as input a graph and labels over all nodes. Randomly sumples a number of nodes (num_seeds) and predicts the labels of teh remaining ones. Experiments are repeated for a predefined number of times (num_iters) and the mean Micro F1 and Macro F1 scores are reported.
  • predict: This is the operational mode. A graph is given and a file with a subset of nodes and its labels. The selected method is implemented and the predicted labels over all the nodes of the graph in a predefined (outfile) output file.

Methods included:

  • PPR: Personalized PageRank
  • TunedRwR: Tuned random walk with restarts ( see here )
  • AdaDIF: Adaptive Diffusions ( see here )

INPUT FILES FORMAT

SSL loads the graph in adjacency list format from a .txt file that contains edges as tab separated pairs of node indexes in the format: node1_index \tab node2_index. Node indexes should be in range [1 , 2^64 ].

For multiclass graphs, the labels are loaded from a .txt file where each line is of the format: node_index \tab label . Labels have to be integers in [-127,127].

For multilabel graphs, labels are loaded from a .txt file in compressed one-hot-matrix form (see graphs/HomoSapiens/class.txt for example).

when in test mode, all nodes must be labeled (present in the label file).

When in predict(ion) mode, any subset of nodes can be labeled.

OUTPUT FILES FORMAT

  • Multiclass: Similar to input, each line is node_index \tab predicted_label
  • Multilabel: The output for multilabel graphs is a ranking for every node. Each line follows the format node_index: \tab pred_1 pred_2 ... pred_c, where pred_i is the i-th most probable label for this node.

COMPILATION

Dependencies: blas and pthread must be installed

Command line: make clean and then make

EXECUTION

Command line: ./SSL [OPTIONS]

OPTIONS

Command line optional arguments with values:

ARGUMENT VALUES DEFAULT DESCRIPTION
--mode test
predict
test Operational mode (see Overview)
--method Tuned_RwR
AdaDIF
PPR
AdaDIF Selection of prediction method (see Overview)
--graph_file (adjacency list).txt graphs/BlogCatalog/adj.txt See Input Files Format
--label_file (label list or one-hot).txt graphs/BlogCatalog/class.txt See Input Files Format
--outfile (predicted labels).txt out/label_predictions.txt File where predictions are stored when in --mode = __predict__ (see Output Files Format)
--num_seeds [1, 2^16] 1030 Number of nodes that are labeled ( only works when --mode = __test__ )
--walk_length [1, 2^16] 10 Length of AdaDIF (and/or PPR) random walk.
--lambda_trwr >=0.0 1.0 Regularization parameter for Tuned RwR method
--lambda_addf >=0.0 5.0 Smoothness over the graph regularization parameter for AdaDIF method
--num_iters [1, 2^16] 1 Number of experiments performed ( only works when --mode = __test__ )

Default values can be changed by editing defs.h

Command line optional arguments without values:

ARGUMENT RESULT
--unconstrained switches AdaDIF to unconstrained mode
--single_thread forces single thread execution
--multiclass specifies multiclass input / output (default is multilabel)

About

Methods for semi-supervised learning on graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published