FedSim is a coupled vertical federated learning framework that boosts the training with record similarities.
- Install conda 4.14 following https://www.anaconda.com/products/distribution
- Clone this repo by
git clone https://github.com/JerryLife/FedSim.git
- Create environment (named
fedsim
) and install required basic modules.
conda env create -f environment.yml
conda activate fedsim
- Install
torch
andtorchvision
according to your cuda version withpip
. For RTX 3090, we installedtorch==1.8.2
andtorchvision==0.9.2
as below.
pip3 install torch==1.8.2 torchvision==0.9.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
- Ensure all the required folders are created (which should exist upon git clone).
mkdir -p runs ckp log cache
In this repo, due to the size limit, we include two datasets house
and game
in the data/
folder.
data
├── beijing (house)
│ ├── airbnb_clean.csv (Secondary)
│ └── house_clean.csv (Primary)
└── hdb (hdb)
├── hdb_clean.csv (Primary)
└── school_clean.csv (Secondary)
The linkage and training of each dataset is combined in a single script.
The scripts without adding noise are located under src/
in the format of src/train_<dataset>_<algorithm>.py
. You can run each script by
python src/train__.py [-g gpu_index] [-p perturbed_noise_on_similarity] [-k number_of_neighbors] [--mlp-merge] [-ds] [-dw]
-
-g/--gpu
: GPU index to run this script. If GPU of this index is not available, CPU will be used instead. -
-k/--top-k
: Number of neighbors to extract from possible matches, which should be less than the value of "knn_k". ($K$ in the paper) -
-p/--leak-p
: The probability of leakage of bloom filters. ($\tau$ in the paper) -
--mlp-merge
: whether to replace CNN merge model with MLP merge model -
-ds/--disable-sort
: whether to distable the sort gate -
-dw/--disable-weight
: whether to disable the weight gate
Taking house dataset dataset as an example:
python src/train_beijing_fedsim.py -g 1 -p 1e0 -k 5 -ds
runs FedSim on house dataset with
The scripts with adding noise are located in src/priv_scripts
in the same format as the scripts without noise. The only difference are some hyperparamter settings. You may run these scripts by similar command. For example,
python src/train_beijing_fedsim.py -g 1 -p 1e-2 -k 5 -ds
runs FedSim on house dataset with noise satisfying
@inproceedings{NEURIPS2022_84b74416,
author = {Wu, Zhaomin and Li, Qinbin and He, Bingsheng},
booktitle = {Advances in Neural Information Processing Systems},
editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
pages = {21087--21100},
publisher = {Curran Associates, Inc.},
title = {A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning},
url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/84b744165a0597360caad96b06e69313-Paper-Conference.pdf},
volume = {35},
year = {2022}
}