This repository contains the implementation code for our NeurIPS 2024 paper Decoupled Kullback-Leibler (DKL) Divergence Loss, arXiv.
In this paper, we delve deeper into the Kullback–Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error ($\mathbf{w}$MSE) loss and 2) a Cross-Entropy loss incorporating soft labels. Thanks to the decomposed formulation of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property. This modification ensures that the $\mathbf{w}$MSE component is always effective during training, providing extra constructive cues. Secondly, we introduce class-wise global information into KL/DKL to mitigate bias from individual samples. With these two enhancements, we derive the Improved Kullback–Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training and knowledge distillation tasks. The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard --- RobustBench and competitive performance on knowledge distillation, demonstrating the substantial practical merits.
Please refer to KD-dkl for training and evaluation.
Please refer to Imbalanced-KD-dkl for training and evaluation.
By 2023/05/20, with IKL loss, we achieve new state-of-the-art adversarial robustness under settings that with/without augmentation strategies on auto-attack.
Please refer to Adv-training-dkl for training and evaluation.
Please refer to Semi-Supervised-Learning-dkl for training and evaluation.
If you have any questions, feel free to contact us through email ([email protected]) or Github issues. Enjoy!
If you find this code or idea useful, please consider citing our related work:
@article{cui2023decoupled,
title={Decoupled Kullback-Leibler Divergence Loss},
author={Cui, Jiequan and Tian, Zhuotao and Zhong, Zhisheng and Qi, Xiaojuan and Yu, Bei and Zhang, Hanwang},
journal={arXiv preprint arXiv:2305.13948},
year={2023}
}
@inproceedings{cui2021learnable,
title={Learnable boundary guided adversarial training},
author={Cui, Jiequan and Liu, Shu and Wang, Liwei and Jia, Jiaya},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={15721--15730},
year={2021}
}
@ARTICLE{10130611,
author={Cui, Jiequan and Zhong, Zhisheng and Tian, Zhuotao and Liu, Shu and Yu, Bei and Jia, Jiaya},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Generalized Parametric Contrastive Learning},
year={2023},
volume={},
number={},
pages={1-12},
doi={10.1109/TPAMI.2023.3278694}}
@inproceedings{cui2021parametric,
title={Parametric contrastive learning},
author={Cui, Jiequan and Zhong, Zhisheng and Liu, Shu and Yu, Bei and Jia, Jiaya},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={715--724},
year={2021}
}