The proposed TabGraphs benchmark can be downloaded via our Zenodo record. It is necessary to put the compressed .zip
files into the datasets
directory. To unzip a dataset <dataset_name>
, one can run unzip <dataset_name>
in their terminal.
In each dataset subfolder, we provide the following files:
features.csv
— node featurestargets.csv
— node targetsedgelist.csv
— list of edges in graphtrain_mask.csv
,valid_mask.csv
,test_mask.csv
— split masks
Besides that, we put info.yaml
with the necessary information about dataset:
dataset_name
— dataset nametask
— prediction taskmetric
— metric used to evaluate predictive performancenum_classes
— number of classes, if applicablehas_unlabeled_nodes
— whether dataset has unlabaled nodeshas_nans_in_num_features
— whether dataset has NaNs in numerical featuresgraph_is_directed
— whether graph is directedgraph_is_weighted
— whether graph is weighted (if true, thenedgelist.csv
has 3 columns instead of 2)target_name
— target namenum_feature_names
— list of numerical feature namescat_feature_names
— list of categorical feature namesbin_feature_names
— list of binary feature names
Note! The proposed TabGraphs benchmark is released under the CC BY 4.0 International license.
In source
directory, one can also find the source code for reproducing experiments in our paper. Note that only gnns
subfolder contains our original code, while subfolders bgnn
, ebbs
and tabular
are taken from open sources and adapted to make them consistent with our experimental setup.
Further, we provide the original sources:
tabular
— github.com/yandex-research/tabular-dl-tabrbgnn
— github.com/nd7141/bgnnebbs
— github.com/JiuhaiChen/EBBS
The only changes that were made in the original repositories are related to the logging of experimental results and the metrics used for validation.
- Run notebook
notebooks/prepare-graph-augmentation.ipynb
to prepare graph-based feature augmentations (NFA) that can be used by tabular baselines fromtabular
. - Run notebook
notebooks/prepare-node-embeddings.ipynb
to prepare optional DeepWalk embeddings (DWE) for the proposed datasets that can further improve predictive performance. - Run notebook
notebooks/convert-graph-datasets.ipynb
to convert the provided graph datasets (probably with NFA and/or DWE) into the format required bytabular
baselines and specialized modelsbgnn
andebbs
. - Run experiments according to the instructions provided in the corresponding directories.
Note! The source code for tabular
baselines and bgnn
model is distributed under the MIT license, and our code for gnns
is also released under the same MIT license.