- BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks
-
5/1/2024 - The Myket Dataset used in paper (https://arxiv.org/abs/2308.06862) has been added to BenchTemp. Data preprocess code at preprocess/Myket.py
-
2/9/2023 - All datasets have been hosted on the open-source platform zenodo (https://zenodo.org/) with a Digital Object Identifier (DOI) 10.5281/zenodo.8267846 (https://zenodo.org/record/8267846).
-
20/8/2023 - We have added four large-scale datasets (eBay-Large, DGraphFin-Large, YouTubeReddit-Large, and Taobao-Large).
- 12/7/2023 - We have uploaded experimental codes in folder experimental_codes.
- 25/6/2023 - We have updated BenchTemp website.
- 24/6/2023 - We have updated the reference of BenchTemp on github.
BenchTemp is a general Benchmark Python Library for evaluating Temporal Graph Neural Networks (TGNNs) quickly and efficiently on various workloads. BenchTemp provides Benchmark Datasets, and unified pipelines (DataPreprocessor, DataLoader EdgeSampler, Evaluator, EarlyStopMonitor, BenchTempLoss, BenchTempOptimizer, and Leaderboard) for evaluating Temporal Graph Neural Networks on both link prediction task and node classification task.
- Datasets - https://zenodo.org/record/8267846
- Code - https://github.com/qianghuangwhu/benchtemp
- Leaderboards - https://my-website-6gnpiaym0891702b-1257259254.tcloudbaseapp.com/
- The source codes for evaluating existing Temporal Graph Neural Networks based on BenchTemp are in folder experimental_codes.
Please ensure that you have installed the following dependencies:
- numpy >= 1.18.0
- pandas >= 1.2.0
- sklearn >= 0.20.0
pip install benchtemp
After installing benchtemp PyPI library, you can evaluating your TGNN models on dynamic link prediction task and dynamic node classification task easily and quickly. For example:
benchtemp provides lp.DataLoader, lp.RandEdgeSampler, EarlyStopMonitor, Evaluator for dynamic link prediction task. Users can evaluating their TGNN models by those components provided by benchtemp. See train_link_prediction.py in folder experimental_codes for details.
Example Framework:
# please import our benchtemp library
import benchtemp as bt
# For example, if you are training , you should create a training RandEdgeSampler based on the training dataset.
data = bt.lp.DataLoader(dataset_path="./data/", dataset_name='mooc')
# dataloader for dynamic link prediction task
node_features, edge_features, full_data, train_data, val_data, test_data, new_node_val_data, new_node_test_data, new_old_node_val_data, new_old_node_test_data, new_new_node_val_data, new_new_node_test_data, unseen_nodes_num = data.load()
train_rand_sampler = bt.lp.RandEdgeSampler(train_data.sources, train_data.destinations)
monitor = bt.EarlyStopMonitor()
# Users' own TGNN models or SOTA TGNN models
model = TGNN(parameters)
...
for epoch in range(args.epochs):
...
# sample an equal amount of negatives to the positive interactions.
size = len(train_data)
_, negatives_batch = train_rand_sampler.sample(size)
...
...
pre_positive, pre_negative = model(positive_batch,negatives_batch)
loss = loss_function(pre_positive, pre_negative, labels)
...
...
val_ap = model(val_data)
if monitor.early_stop_check(val_ap):
break
...
# testing
pre = model(test_data)
results = bt.evaluator(pre, labels)
benchtemp provides nc.DataLoader, EarlyStopMonitor, Evaluator for dynamic node classification. See train_node_classification.py in folder experimental_codes for details.
# please import our benchtemp library
import benchtemp as bt
# For example, if you are training , you should create a training RandEdgeSampler based on the training dataset.
data = bt.nc.DataLoader(dataset_path="./data/", dataset_name='mooc')
# dataloader for dynamic node classification task
node_features, edge_features, full_data, train_data, val_data, test_data, new_node_val_data, new_node_test_data, new_old_node_val_data, new_old_node_test_data, new_new_node_val_data, new_new_node_test_data, unseen_nodes_num = data.load()
# Users' own TGNN models or SOTA TGNN models
model = TGNN(parameters)
...
for epoch in range(args.epochs):
...
# sample an equal amount of negatives to the positive interactions.
size = len(train_data)
...
...
pre_positive, pre_negative = model(positive_batch,negatives_batch)
loss = loss_function(pre_positive, pre_negative, labels)
...
...
val_ap = model(val_data)
if monitor.early_stop_check(val_ap):
break
...
# testing
pre = model(test_data)
results = bt.evaluator(pre, labels)
The datasets that have been preprocessed by BenchTemp are Here. You can directly download the datasets and then put them into the directory './data'.
In addition, BenchTemp provides DataPreprocessor class for you to preprocess yours TGNNs datasets.
Class:
DataPreprocessor(data_path: str, data_name: str)
Args:
- data_path: str - The path of the dataset.
- data_name: str - The name of the dataset.
Function:
DataPreprocessor.data_preprocess(bipartite: bool)
Args:
- bipartite: bool - Whether the Temporal Graph is a bipartite graph (Heterogeneous or Homogeneous).
Returns:
-
ml_{data_name}.csv - The csv file of the Temporal Graph. This file have five columns with properties:
- 'u': The id of the user.
- 'i': The id of the item.
- 'ts': The timestamp of the interaction (edge) between the user and the item.
- 'label': The label of the interaction (edge).
- 'idx': The index of the interaction (edge).
-
ml_{data_name}.npy - The edge features corresponding to the interactions (edges) in the the Temporal Graph..
-
ml_{data_name}_node.npy - The initialization node features of the Temporal Graph.
Example:
import benchtemp as bt
processor = bt.DataPreprocessor(data_path="./data/", data_name="mooc")
# If the dataset is bipartite graph, i.e. the user (source nodes) and the item (destination nodes) are of the same type.
processor.data_preprocess(bipartite=True)
# If the dataset is non-bipartite graph.
processor.data_preprocess(bipartite=False)
The class of a temporal graph. A temporal graph can be represented as an ordered sequence of temporal user-item
interactions
Class:
TemporalGraph(sources: numpy.array, destinations: numpy.array, timestamps: numpy.array, edge_idxs: numpy.array, labels: numpy.array)
Args:
- sources: numpy.array - Array of sources of Temporal Graph edges.
- destinations: numpy.array - Array of destinations of Temporal Graph edges.
- timestamps: numpy.array - Array of timestamps of Temporal Graph edges.
- edge_idxs: numpy.array - Array of edge IDs of Temporal Graph edges.
- labels: numpy.array - Array of labels of Temporal Graphe dges.
Returns:
- benchtemp.TemporalGraph. A Temporal Graph.
Example:
import pandas as pd
import numpy as np
import benchtemp as bt
graph_df = pd.read_csv("dataset_path")
sources = graph_df.u.values
destinations = graph_df.i.values
edge_idxs = graph_df.idx.values
labels = graph_df.label.values
timestamps = graph_df.ts.values
# For example, the full Temporal Graph of the dataset is full_data.
full_data = bt.TemporalGraph(sources, destinations, timestamps, edge_idxs, labels)
The DataLoader class for link prediction tasks.
In transductive link prediction, Dataloader splits the temporal graphs chronologically into 70%-15%-15% for train, validation and test sets according to edge timestamps.
In inductive link prediction, Dataloader performs the same split as the transductive setting, and randomly masks 10% nodes as unseen nodes. Any edges associated with these unseen nodes are removed from the training set. To reflect different inductive scenarios, DataLoader further generates three inductive test sets from the transductive test dataset, by filtering edges in different manners:
- Inductive - selects edges with at least one unseen node.
- Inductive New-Old - selects edges between a seen node and an unseen node.
- Inductive New-New - selects edges between two unseen nodes.
Class:
lp.DataLoader(dataset_path: str, dataset_name: str, different_new_nodes_between_val_and_test: bool, randomize_features: bool)
Args:
- dataset_path: str - The path of the dataset.
- dataset_name: str - The name of dataset.
- different_new_nodes_between_val_and_test: bool - The new nodes are between validation set and test set.
- randomize_features: str - Random initialization of node features.
Function:
lp.DataLoader.load()
Returns:
- node_features: numpy.array - Array of the Node Features of the Temporal Graph.
- edge_features: numpy.array - Array of the Edge Features of the Temporal Graph.
- full_data: benchtemp.TemporalGraph - Full Temporal Graph dataset.
- train_data: benchtemp.TemporalGraph - The training set.
- val_data: benchtemp.TemporalGraph - The validation set.
- test_data: benchtemp.TemporalGraph - The Transductive test set.
- new_node_val_data: benchtemp.TemporalGraph - The Inductive validation set.
- new_node_test_data: benchtemp.TemporalGraph - The Inductive test set.
- new_old_node_val_data: benchtemp.TemporalGraph - The Inductive New-Old validation set.
- new_old_node_test_data: benchtemp.TemporalGraph - The Inductive New-Old test set.
- new_new_node_val_data: benchtemp.TemporalGraph - The Inductive New-New validation set.
- new_new_node_test_data: benchtemp.TemporalGraph - The Inductive New-New test set.
- unseen_nodes_num: int - The number of unseen nodes in inductive setting.
Example:
import benchtemp as bt
data = bt.lp.DataLoader(dataset_path="./data/", dataset_name='mooc')
node_features, edge_features, full_data, train_data, val_data, test_data, new_node_val_data, new_node_test_data, new_old_node_val_data, new_old_node_test_data, new_new_node_val_data, new_new_node_test_data, unseen_nodes_num = data.load()
BenchTemp provides the unified negative edge sampler class with a seed named RandEdgeSampler for link prediction task to sample an equal amount of negatives to the positive interactions.
Class:
RandEdgeSampler(src_list: numpy.array, dst_list: numpy.array, seed: int)
Args:
- src_list: numpy.array - Array of source nodes.
- dst_list: numpy.array - Array of destination nodes.
- seed: numpy.array - The seed of random.
Function:
RandEdgeSampler.sample(size: int)
Args:
- size: int - The size of the sampling negative edges.
Returns:
- src_list: numpy.array - Array of source nodes of the sampling negative edges.
- dst_list: numpy.array - Array of destination nodes of the sampling negative edges.
Example:
import benchtemp as bt
# For example, if you are training , you should create a training RandEdgeSampler based on the training dataset.
train_rand_sampler = bt.lp.RandEdgeSampler(train_data.sources, train_data.destinations)
...
for epoch in range(args.epochs):
...
# sample an equal amount of negatives to the positive interactions.
size = len(train_data)
_, negatives_batch = train_rand_sampler.sample(size)
...
...
The DataLoader class for the node classification task. The DataLoader module sorts edges and splits the input dataset (70%-15%-15%) according to edge timestamps.
Class:
nc.DataLoader(dataset_path: str, dataset_name: str, use_validation: bool)
Args:
- dataset_path: str - The path of the dataset.
- dataset_name: str - The name of the dataset.
- use_validation: bool - Whether use validation dataset or not.
Function:
nc.DataLoader.load()
Returns:
- node_features: numpy.array - Array of the Node Features of the Temporal Graph.
- edge_features: numpy.array - Array of the Edge Features of the Temporal Graph.
- full_data: benchtemp.TemporalGraph - Full Temporal Graph dataset for node classification task.
- train_data: benchtemp.TemporalGraph - The training set for node classification task.
- val_data: benchtemp.TemporalGraph - The validation set for node classification task.
- test_data: benchtemp.TemporalGraph - The test set for node classification task.
Example:
import benchtemp as bt
data = bt.nc.DataLoader(dataset_path="./data/", dataset_name='mooc', use_validation=True)
node_features, edge_features, full_data, train_data, val_data, test_data = data.load()
BenchTemp provides a unified EarlyStopMonitor to improve training efficiency and save resources.
Class:
EarlyStopMonitor(max_round: int, higher_better: bool, tolerance: float)
Args:
- max_round: int - The number of rounds for early stop.
- higher_better: bool - The higher the value, the better the performance.
- tolerance: float - The tolerance of the EarlyStopMonitor.
Function:
EarlyStopMonitor.early_stop_check(curr_val:float)
Args:
- curr_val: float - The value to check for early stop.
Returns:
- True - If the value matches the setting of the EarlyStopMonitor.
- False - If the value does not match the setting of the EarlyStopMonitor.
Example:
import benchtemp as bt
...
early_stopper = bt.EarlyStopMonitor(max_round=args.patience)
for epoch in range(args.epochs):
...
val_ap = model(val_datasets)
if early_stopper.early_stop_check(val_ap):
break
...
...
Different evaluation metrics are available, including Area Under the Receiver Operating Characteristic Curve (ROC AUC) and Average Precision (AP). Usually, metrics Area Under the Receiver Operating Characteristic Curve (ROC AUC) and average precision (AP) are for the link prediction task, while metrics AUC is for the node classification task.
Class:
Evaluator(task_name: str)
Args:
- task_name: str - the name of the task, choice in ["LP", "NC"], LP for the link prediction task and NC for the node classification task.
Function:
Evaluator.eval(pred_score: numpy.array, true_label: numpy.array)
Args:
- pred_score: numpy.array- Array of prediction scores.
- true_label: numpy.array - Array of true labels.
Returns:
- AUC: float - the value of the AUC.
- AP: float - the value of the AP.
Example:
import benchtemp as bt
# For example, Link prediction task. Evaluation Metrics: AUC, AP.
evaluator = bt.Evaluator("LP")
...
# test data
pred_score = model(test_data)
test_auc, test_ap = evaluator.eval(pred_score, true_label)
...
import benchtemp as bt
# For example, node classification task. Evaluation Metrics: AUC.
evaluator = bt.Evaluator("NC")
...
# test data
pred_score = model(test_data)
test_auc = evaluator.eval(pred_score, true_label)
...
BenchTemp project is looking for contributors with expertise and enthusiasm! If you have the desire to contribute to BenchTemp, please contact BenchTemp team. Contributions and issues from the community are eagerly welcomed, with which we can together push forward the TGNN research.