Safe and Robust Watermark Injection with a Single OoD Image

Code for our ICLR2024 paper Safe and Robust Watermark Injection with a Single OoD Image by Shuyang Yu, Junyuan Hong, Haobo Zhang, Haotao Wang, Zhangyang Wang, Jiayu Zhou.

Overview

Training a high-performance deep neural network requires large amounts of data and computational resources. Protecting the intellectual property (IP) and commercial ownership of a deep model is challenging yet increasingly crucial. A major stream of watermarking strategies implants verifiable backdoor triggers by poisoning training samples, but these are often unrealistic due to data privacy and safety concerns and are vulnerable to minor model changes such as fine-tuning. To overcome these challenges, we propose a safe and robust backdoor-based watermark injection technique that leverages the diverse knowledge from a single out-of-distribution (OoD) image, which serves as a secret key for IP verification. The independence of training data makes it agnostic to third-party promises of IP security. We induce robustness via random perturbation of model parameters during watermark injection to defend against common watermark removal attacks, including fine-tuning, pruning, and model extraction. Our experimental results demonstrate that the proposed watermarking approach is not only time and sample-efficient without training data, but also robust against the watermark removal attacks above.

Preparation

You can use Use conda env create -f environment.yml to create a conda env. Major dependencies include pytorch, torchvision, wandb, numpy.
Pre-train models on CIFAR10, CIFAR100, and GTSRB by yourself or download our pre-trained models from google drive.
Specific the root to pre-trained models at utils/config.py.
Sign up wandb and set up by running wandb login with your API from the website. Detailed instruction.

Watermark injection

Generate a surrogate OoD dataset (trigger set) from single OoD image

Step 1: Surrogate OoD data generation

cd data_generation

and then generate the dataset according to data_generation/README.md. Note that --targetpath will be the path to store your trigger set. Our default OoD image is images/ameyoko.jpg.

Step 2: Label this surrogate dataset using a pre-trained model. An example for labeling the data is:

wandb sweep run_sweeps/cifar100_wrn_poi_one_image_label.yml

Watermark injection

You can inject the watermark using this generated OoD dataset using the following command: For model pre-trained on CIFAR10

wandb sweep run_sweeps/cifar10_wrn_poi_one_image_distill_poisontrain.yml

For model pre-trained on CIFAR100

wandb sweep run_sweeps/cifar100_wrn_poi_one_image_distill_poisontrain.yml

For model pre-trained on GTSRB

wandb sweep run_sweeps/gtsrb_resnet18_poi_one_image_distill_poisontrain.yml

We list the trigger patterns we used in this paper as follows:

BadNets with grid (badnet_grid), l0-invisible (l0_inv), smooth (smooth), Trojan Square 3 × 3 (trojan_3×3), Trojan Square 8×8 (trojan_8×8), and Trojan watermark (trojan_wm).

Evaluation against watermark removal attacks

An example for evaluating the robustness of the model against watermark removal attacks is:

wandb sweep run_sweeps/cifar10_wrn_poi_one_image_evaluate.yml

The parameters for different watermark removal attacks are shown in this table:

Attack types	args.method	args.adversary	args.prune_ratio
FT-AL	finetune	ftal
FT-LL	finetune	ftll
RT-AL	finetune	rtal
Pruning-50%	finetune	prune	0.5
Model extraction (knockoff)	extraction	knockoff
OoD detection	detection	energy

Citation

@inproceedings{yu2023safe,
  title={Safe and Robust Watermark Injection with a Single OoD Image},
  author={Yu, Shuyang and Hong, Junyuan and Zhang, Haobo and Wang, Haotao and Wang, Zhangyang and Zhou, Jiayu},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data_generation		data_generation
models		models
run_sweeps		run_sweeps
triggers		triggers
utils		utils
.gitattributes		.gitattributes
README.md		README.md
environment.yml		environment.yml
evaluate.py		evaluate.py
get_label_distribution.py		get_label_distribution.py
main.py		main.py
oodbackdoor.png		oodbackdoor.png
plot_weights.py		plot_weights.py
t_test.py		t_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Safe and Robust Watermark Injection with a Single OoD Image

Overview

Preparation

Watermark injection

Generate a surrogate OoD dataset (trigger set) from single OoD image

Watermark injection

Evaluation against watermark removal attacks

Citation

About

Releases

Packages

Languages

illidanlab/Single_oodwatermark

Folders and files

Latest commit

History

Repository files navigation

Safe and Robust Watermark Injection with a Single OoD Image

Overview

Preparation

Watermark injection

Generate a surrogate OoD dataset (trigger set) from single OoD image

Watermark injection

Evaluation against watermark removal attacks

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages