Zeno

This is the python implementation of the paper "Zeno++: Robust Asynchronous SGD with Arbitrary Number of Byzantine Workers"

Requirements

The following python packages needs to be installed by pip:

MXNET (we use GPU, thus mxnet-cu80 is preferred)
Gluon-CV
Numpy

The users can simply run the following commond in their own virtualenv:

pip install --no-cache-dir numpy mxnet-mkl gluoncv

Run the demo

Options:

Option	Desctiption
--dir	path of datasets
--batch_size 128	batch size of the workers
--nepochs 200	total number of epochs
--interval 10	log interval
--lr 0.1	learning rate
--lr-decay 0.1	rate of diminishing learning rate
--lr-decay-epoch 100,150	epochs where the learning rate decays
--classes	number of classes
--nworkers 20	number of workers
--nbyz	number of faulty workers
--byz_type	type of failures, signflip or labelflip
--byz-param-a	hyperparameter of Byzantine workers
--byz-param-b	hyperparameter of Byzantine workers
--byz-param-c	hyperparameter of Byzantine workers
--model	name of neural network
--seed 337	random seed
--max-delay 15	maximum of global delay
--byz-test	Byzantine tolerant algorithms: none, kardam, or zeno++
--rho	hyperparameter \rho of Zeno++
--epsilon	hyperparameter \epsilon of Zeno++
--zeno-delay 10	delay of g_r in Zeno++
--zeno-batchsize 10	batchsize of Zeno++, n_s in the paper

Train with 10 workers, 6 of them are faulty with bit-flipping failures, Zeno as aggregation:

python train_cifar10.py --classes 10 --model default --nworkers 10 --nbyz 6 --byz-type signflip --byz-test zeno++--rho 0.001 --epsilon 0 --zeno-delay 10 --batchsize 128 --lr 0.1 --lr-decay 0.1 --lr-decay-epoch 100,150 --epochs 200 --seed 337 --max-delay 10 --dir $inputdir --log $logfile 2>&1 | tee $watchfile

More detailed commands/instructions can be found in the demo script experiment_script_1.sh

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
convert_cifar10.py		convert_cifar10.py
experiment_script_1.sh		experiment_script_1.sh
experiment_script_2.sh		experiment_script_2.sh
experiment_script_3.sh		experiment_script_3.sh
train_cifar10.py		train_cifar10.py
train_cifar10_server.py		train_cifar10_server.py
train_wikitext.py		train_wikitext.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zeno

This is the python implementation of the paper "Zeno++: Robust Asynchronous SGD with Arbitrary Number of Byzantine Workers"

Requirements

Run the demo

Options:

About

Releases

Packages

Languages

License

xcgoner/iclr2020_zeno_async

Folders and files

Latest commit

History

Repository files navigation

Zeno

This is the python implementation of the paper "Zeno++: Robust Asynchronous SGD with Arbitrary Number of Byzantine Workers"

Requirements

Run the demo

Options:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages