This is code that implements Neural Fingerprinting, a technique to detect adversarial examples.
This accompanies the paper Detecting Adversarial Examples via Neural Fingerprinting, Sumanth Dathathri(*), Stephan Zheng(*), Richard Murray and Yisong Yue, 2018 (* = equal contribution), which can be found here:
If you use this code or work, please cite:
title = {Detecting Adversarial Examples via Neural Fingerprinting},
author={Dathathri, Sumanth and Zheng, Stephan and Murray, Richard and Yue, Yisong},
year = {2018}
eprint = {1803.03870}
ee = {}
To clone the repository, run:
git clone
cd neural-fingerprinting
Neural Fingerprinting achieves near-perfect detection rates on MNIST, CIFAR and MiniImageNet-20.
ROC curves for detection of different attacks on CIFAR.
We have tested this codebase with the following dependencies (we cannot guarantee compatibility with other versions).
- PyTorch >= 0.2 (torch (0.2.0.post3) torchvision (0.1.9))
- Tensorflow >=1.4.1 (tensorflow (1.4.1))
- Keras 2.0.8
- to transfer models from Tensorflow to PyTorch.
- scikit-learn
To install these dependencies, run:
# PyTorch: find detailed instructions on [](
pip install torch
pip install torchvision
# TF: find detailed instructions on [](
pip install keras
pip install tensorflow-gpu
# nn_transfer
git clone
cd nn-transfer
pip install .
pip install sklearn
This codebase relies on third-party implementations for adversarial attacks and code to transfer generated attacks from Tensorflow to PyTorch.
- Local Intrinsic Dimensionality for Adversarial Subspace Detection a library to generate all adversarial attacks.
- Cleverhans: a library to generate gradient-based attacks, called by the LID code. This codebase has been included in the
folder. - code to generate iterative fast-gradient attacks on ImageNet examples.
To train and evaluate models with fingerprints, use the launcher script
, which contains example calls to run the code.
The flags that can be set for the launcher are:
./ dataset train attack eval grid num_dx eps epoch_for_eval
- dataset: 'mnist', 'cifar' or 'miniimagenet'
- train: 'train' or 'notrain' -- do training or not
- attack: 'train' or 'notrain' -- create adversarial examples or not
- eval: 'eval' or 'noeval' -- do evaluation or not
- grid: 'grid' or 'nogrid' -- enables a grid search for hyperparameter tuning.
- num_dx: number of fingerprint directions
- eps: standard deviation of randomly sampled fingerprint directions
- epoch_for_eval: which model epoch to use for evaluation
For instance, the following command trains a convolutional neural network for MNIST with 10 fingerprints with epsilon = 0.1, and evaluates the model after 10 epochs of training:
./ mnist train attack eval nogrid 10 0.1 10
- To train a model with fingerprints:
mkdir -p $LOGDIR
mkdir -p $DATADIR
python $NAME/ \
--batch-size 128 \
--test-batch-size 128 \
--epochs $NUM_EPOCHS \
--lr 0.01 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--log-dir $LOGDIR \
--data-dir $DATADIR \
--eps=$EPS \
--num-dx=$NUMDX \
--num-class=10 \
- Creating adversarial attacks for the model after 10 epochs of training:
python $NAME/ \
--attack "all" \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $ADV_EX_DIR \
--batch-size 128
- Evaluating model
mkdir -p $EVAL_LOGDIR
python $NAME/ \
--batch-size 128 \
--epochs 100 \
--lr 0.001 \
--momentum 0.9 \
--seed 0 \
--log-interval 10 \
--ckpt $LOGDIR/ckpt/state_dict-ep_$EPOCH.pth \
--log-dir $EVAL_LOGDIR \
--fingerprint-dir $LOGDIR \
--adv-ex-dir $ADV_EX_DIR \
--data-dir $DATADIR \
--eps=$eps \
--num-dx=$numdx \
--num-class=10 \