Visual adversarial attacks and defenses

Deep neural networks (DNNs) have been shown to be successful in several vision tasks, such as image classification, object detection, semantic segmentation, optical flow estimation, and video classification. However, DNNs are sensitive to perturbations of the input data that produce the so-called adversarial examples that induce DNNs to erroneous predictions. The study of data alterations designed to evade a classifier is not new: techniques that mislead classifiers have been discussed for over two decades and include attacks on fraud detection systems, spam filters and on specific classifiers, such as Support Vector Machines. More recently, there has been a growing interest in adversarial examples for DNNs in visual tasks.

Adversarial examples in visual tasks are generated by modifying pixel values with carefully crafted additive noise that is imperceptible to the human eyes; or that replaces image regions (rectangular or circular) or the border of an image. Adversarial examples help investigate and improve the robustness of DNN models as well as protect private information in images. An adversarial attack can be targeted or untargeted. Targeted attacks modify an image or a video for the DNN model to predict a specified class label, such as an object type or a predefined object trajectory in subsequent frames. Untargeted attacks modify a source image or video to be classified as any incorrect label other than the original one, or the video perturbations can generate incorrect bounding boxes to mislead a tracker. Finally, carefully modified stop signs can cause a false negative detection or an incorrect detection of another object type.

The chapter Visual adversarial attacks and defenses in the book Advanced Methods And Deep Learning In Computer Vision presents the problem definition off adversarial attacks for visual tasks using both images and videos as input, their main properties and the types of perturbation they generate, as well as the target models and datasets used to craft adversarial attacks in the attack scenarios. Specifically, the chapter covers adversarial attacks for image processing tasks, image classification, semantic segmentation and object detection, object tracking, and video classification, as well as defenses devised against these adversarial attacks. In this webpage, we summarise the properties of the visual adversarial attacks and we provide the tables, extracted from the chapter, that summarise adversarial attacks for tasks that use images as input, adversarial attacks for tasks that use videos as input, and defenses against adversarial attacks.


Visual adversarial

Illustrative example of adversarial attack for image classification. The original image is classified with the label hare by the target model (top), while the perturbed image (adversarial example), obtained with the Basic Iterative Method attack, is classified with the label armadillo by the same model (bottom). Note that the target model is an illustrative and abstract representation of the Inception V3 classifier that is trained on ImageNet. Note also that the magnitude of the perturbation is scaled up to 20 times larger than the real one for visualization purposes.


Do you want to include your attack or defense?
Submit the attack/defense using the link under the corresponding table!



Properties of an adversarial attack

Effectiveness
The effectiveness of an adversarial attack is the degree to which it succeeds in misleading a machine learning model. Effectiveness can be measured as the accuracy of the model over a target dataset. The lower the accuracy, the higher the effectiveness of an adversarial attack.

Robustness
The robustness of an adversarial attack is its effectiveness in the presence of a defense that removes the effect of the adversarial perturbation δ prior to the data being processed by the target model f(·). Examples of defense include median filtering, requantization, and JPEG compression. Robustness can be measured as the difference in accuracy of the target model over a target dataset when a defense is used with respect to a setting when the defense is not used for the target model. The smaller this difference, the higher the robustness of an adversarial attack.

Transferability
The transferability of an adversarial attack is the extent to which a perturbation δ crafted for a target model f(·) is effective in misleading another model that was not used to generate the advewrsarial perturbation δ. Transferability can be measured as the difference in accuracy of f (·) and f'(·) over a target dataset of adversarial examples crafted for the target model f(·). The smaller this difference, the higher the transferability of the attack to the classifier f'(·).

Noticeability
The noticeability of an adversarial attack is the extent to which an adversarial perturbation δ can be seen as such by a person looking at an image/video. Noticeability can be measured with a double stimulus test that compares image or video clip pairs; a single stimulus test on the naturalness of image or video clips controlled by the results obtained with the corresponding original image or video clip; or with a (reliable) no-reference perceptual quality measure.


In addition to these four main properties, other properties, such as detectability and reversibility, may be considered when analyzing or evaluating adversarial attacks for specific tasks or objectives.

Detectability
The detectability of an adversarial attack is the extent to which a defense mechanism is capable of identifying that a perturbation was applied to modify an original image, video, or scene. Detectability, which is related to robustness, can be measured as the proportion of adversarial examples that are detected as such in a given dataset or given scenarios. A (successful) defense can be used to determine the detectability of an attack by comparing the output of the target model f(·) on a given input and on the same input preprocessed with a defense, as different outputs suggest the presence of an attack.

Reversibility
The reversibility of an adversarial attack is the extent to which an analysis of the predictions or output labels of f(·) may support the retrieval of the original class of x˙. For instance, the analysis of the frequency of an adversarial-original prediction mapping revealed that untargeted attacks are more reversible than targeted attacks.



Adversarial attacks for tasks that use images as input

Mouse over the reference link of each row to see the details of the publication or work (this does not work for mobile devices).

Legend

Reference Method box T T B B Approach Datasets Tasks
Szegedy et al. (2014)Intriguing properties of neural networks
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.
Proceedings of the International Conference on Learning Representations, 2014.
L-BFGS Opt ImageNet, MNIST, Youtube C
Carlini and Wagner (2017)Towards evaluating the robustness of neural networks
Carlini, N., Wagner, D.
Proceedings of the IEEE Symposium on Security and Privacy, 2017.
CW Opt MNIST, CIFAR C
Goodfellow et al. (2015)Explaining and harnessing adversarial examples
Goodfellow, I., Shlens, J., Szegedy, C.
Proceedings of the International Conference on Learning Representations, 2015.
FGSM Grad MNIST C
Kurakin et al. (2017)Adversarial machine learning at scale
Kurakin, A., Goodfellow, I., Bengio, S.
Proceedings of the International Conference on Learning Representations, 2017.
BIM (I-FGSM) Grad ImageNet C
Madry et al. (2018)Towards deep learning models resistant to adversarial attacks
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.
Proceedings of the International Conference on Learning Representations, 2018.
PGD Grad MNIST, CIFAR C
Papernot et al. (2016)The limitations of deep learning in adversarial settings
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.
Proceedings of the IEEE European Symposium on Security and Privacy., 2016.
JSMA Grad MNIST C
Moosavi et al. (2016)Deepfool: a simple and accurate method to fool deep neural networks
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
DeepFool Grad ImageNet, MNIST, CIFAR C
Modas et al. (2019)Sparsefool: a few pixels make a big difference
Modas, A., Moosavi-Dezfooli, S.M., Frossard, P.
Proceedings of the IEEE/CV Conference on Computer Vision and Pattern Recognition, 2019.
SparseFool Grad ImageNet, MNIST, CIFAR C
Xie et al. (2019)Improving transferability of adversarial examples with input diversity
Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., Yuille, A.L.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
DI2-FGSM ○, ● Grad ImageNet C
Tramer et al. (2018)Ensemble adversarial training: attacks and defenses
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.
Proceedings of the International Conference on Learning Representations., 2018.
E-FGSM ○, ● Grad ImageNet C
Li et al. (2019)Scene privacy protection
Li, C.Y., Shahin Shamsabadi, A., Sanchez-Matilla, R., Mazzon, R., Cavallaro, A.
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing., 2019.
P-FGSM Grad Places C
Sanchez-Matilla et al. (2020)Exploiting vulnerabilities of deep neural networks for privacy protection
Sanchez-Matilla, R., Li, C.Y., Shamsabadi, A.S., Mazzon, R., Cavallaro, A.
IEEE Transactions on Multimedia, vol. 22, pp. 1862–187, 2020.
RP-FGSM Grad Places C
Moosavi et al. (2017)Universal adversarial perturbations
Moosavi-Dezfooli, S.M., Fawzi, A., Fawzi, O., Frossard, P.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017.
UAP Opt ImageNet C
Mopuri et al. (2017)Fast feature fool: a data independent approach to universal adversarial perturbations
Mopuri, K.R., Garg, U., Babu, R.V.
Proceedings of the British Machine Vision Conference, 2017.
Fast Feature Fool Opt ImageNet, Places-205 C
Baluja and Fischer (2018)Learning to attack: adversarial transformation networks
Baluja, S., Fischer, I.
Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
ATN Opt ImageNet, MNIST C
Xiao et al. (2018)Generating adversarial examples with adversarial networks
Xiao, C., Li, B., Zhu, J., He, W., Liu, M., Song, D.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018.
AdvGAN ○, ● Opt ImageNet, MNIST C
Poursaeed et al. (2018)Generative adversarial perturbations
Poursaeed, O., Katsman, I., Gao, B., Belongie, S.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
GAP Opt ImageNet, Cityscapes C,S
Mopuri et al. (2018)NAG: network for adversary generation
Mopuri, K.R., Ojha, U., Garg, U., Babu, R.V.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
NAG ○, ● Opt ImageNet C
Bhattad et al. (2020)Unrestricted adversarial examples via semantic manipulation
Bhattad, A., Chong, M.J., Liang, K., Li, B., Forsyth, D.
Proceedings of the International Conference on Learning Representations, 2020.
Semantic Manipulation Opt ImageNet, MSCOCO C,IC
Shamsabadi et al. (2020)Edgefool: an adversarial image enhancement filter
Shamsabadi, A.S., Oh, C., Cavallaro, A.
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2020.
EdgeFool Opt ImageNet, Places C
Papernot et al. (2016)Transferability in machine learning: from phenomena to black-box attacks using adversarial samples
Papernot, N., McDaniel, P., Goodfellow, I.
arXiv:1605.07277, 2016.
SBA Bound MNIST C
Shi et al. (2019)Curls & whey: boosting black-box adversarial attacks
Shi, Y., Wang, S., Han, Y.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
Curls & Whey Bound ImageNet C
Dong et al. (2019)Evading defenses to transferable adversarial examples by translation invariant attacks
Dong, Y., Pang, T., Su, H., Zhu, J.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
TI Attack Bound ImageNet C
Chen et al. (2017)ZOO: zero-th order optimization based black-box attacks to deep neural networks without training substitute models
Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 2017.
ZOO GradE ImageNet, MNIST, CIFAR C
Ilyas et al. (2018)Black-box adversarial attacks with limited queries and information
Ilyas, A., Engstrom, L., Athalye, A., Lin, J.
Proceedings of the International Conference on Machine Learning, 2018.
Query-limited Attack GradE ImageNet C
Tu et al. (2019)AutoZOOM: Autoencoder-based zeroth order optimization method for attacking black-box neural networks
Tu, C.C., Ting, P., Chen, P.Y., Liu, S., Zhang, H., Yi, J., Hsieh, C.J., Cheng, S.M
Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
AutoZOOM GradE ImageNet, MNIST, CIFAR C
Narodytska and Kasiviswanathan (2017)Simple black-box adversarial attacks on deep neural networks
Narodytska, N., Kasiviswanathan, S.P.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.
LocSearchAdv GradE ImageNet, MNIST, CIFAR, SVHN, STL C
Brendel et al. (2018)Decision-based adversarial attacks: reliable attacks against black-box machine learning models
Brendel, W., Rauber, J., Bethge, M.
Proceedings of the International Conference on Learning Representations, 2018.
BA LocS ImageNet, MNIST, CIFAR C
Guo et al. (2019)Simple black-box adversarial attacks
Guo, C., Gardner, J., You, Y., Wilson, A.G., Weinberger, K.
Proceedings of the International Conference on Machine Learning, 2019.
SimBA LocS ImageNet C
Hosseini and Poovendran (2018)Semantic adversarial examples
Hosseini, H., Poovendran, R.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.
SemanticAdv RanS CIFAR C
Shamsabadi et al. (2020)ColorFool: semantic adversarial colorization
hamsabadi, A.S., Sanchez-Matilla, R., Cavallaro, A.,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition., 2020.
ColorFool RanS ImageNet, CIFAR, Places C
Fischer et al. (2017)Adversarial examples for semantic image segmentation
Fischer, V., Kumar, M.C., Metzen, J.H., Brox, T.
Proceedings of the International Conference on Machine Learning Workshop, 2017.
SSA Grad Cityscapes S
Xie et al. (2017)Adversarial examples for semantic segmentation and object detection
Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017.
DAG Grad VOC S,D
Wei et al. (2019)Transferable adversarial attacks for image and video object detection
Wei, X., Liang, S., Chen, N., Cao, X.
Proceedings of the International Joint Conference on Artificial Intelligence, 2019.
UEA ○, ● Opt ImageNet VID D
Cosgrove and Yuille (2020)Adversarial examples for edge detection: they exist, and they transfer
Cosgrove, C., Yuille, A.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020.
Edge Attack Grad Cityscapes E

Submit a new attack


Adversarial attacks for tasks that use videos as input

Mouse over the reference link of each row to see the details of the publication or work (this does not work for mobile devices).

Legend

Reference Method box T T DD U B B R Approach Datasets Tasks
Ranjan et al. (2019)Attacking optical flow
Ranjan, A., Janai, J., Geiger, A., Black, M.J.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
OFA ○,● Opt KITTI ME
Liang et al. (2020)Efficient adversarial attacks for visual object tracking
Liang, S., Wei, X., Yao, S., Cao, X.
Proceedings of the European Conference on Computer Vision, 2020.
FAN ○,● Gen OTB, VOT OT
Jiang et al. (2019)Black-box adversarial attacks on video recognition models
Jiang, L., Ma, X., Chen, S., Bailey, J., Jiang, Y.
Proceedings of the ACM International Conference on Multimedia, 2019.
V-BAD Opt UCF, HMDB, Kinetics C
Li et al. (2019b)Stealthy adversarial perturbations against real-time video classification systems
Li, S., Neupane, A., Paul, S., Song, C., Krishnamurthy, S.V., Chowdhury, A.K.R., Swami, A.
Proceedings of the Network and Distributed Systems Security Symposium, 2019.
C-DUP Gen UCF, JESTER C
Lo and Patel (2021)MultAV: multiplicative adversarial videos
Lo, S., Patel, V.M.
Proceedings of the IEEE International Conference on Advanced Video and Signal-based Surveillance, 2021.
MultAV Direct, Opt UCF C

Submit a new attack


Defenses against adversarial attacks

Mouse over the reference link of each row to see the details of the publication or work (this does not work for mobile devices).

Legend

Reference Method Goal Approach Datasets Tasks
Hendrycks and Gimpel (2017)Early methods for detecting adversarial images
Hendrycks, D., Gimpel, K.
Proceedings of the International Conference on Learning Representations Workshop, 2017.
PCA Dec Stat ImageNet, MNIST, CIFAR C
Grosse et al. (2017)On the (statistical) detection of adversarial examples
Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.
arXiv:1702.06280, 2017.
Statistical test Dec Stat MNIST C
Gong et al. (2017)Adversarial and clean data are not twins
Gong, Z., Wang, W., Ku, W.S.
arXiv:1704.04960, 2017.
Binary Classifier Dec Aux MNIST, CIFAR, SVH C
Metzen et al. (2017)On detecting adversarial perturbations
Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.
Proceedings of the International Conference on Learning Representations, 2017.
Adversary Detector Net. Dec Aux ImageNet, CIFAR C
Feinman et al. (2017)Detecting adversarial samples from artifacts
Authors
arXiv:1703.00410, 2017.
KDE & BUE Dec KDE & BUE MNIST, CIFAR C
Xu et al. (2018)Feature squeezing: detecting adversarial examples in deep neural networks
Xu, W., Evans, D., Qi, Y.
Proceedings of the Network and Distributed System Security Symposium, 2018.
Feature Squeezing Dec Feature Squeezing MNIST, CIFAR C
Buckman et al. (2018)Thermometer encoding: one hot way to resist adversarial examples
Buckman, J., Roy, A., Raffel, C., Goodfellow, I.
Proceedings of the International Conference on Learning Representations, 2018.
Thermometer Encoding GradM NonDiff MNIST, CIFAR, SVHN C
Guo et al. (2018)Countering adversarial images using input transformations
Guo, C., Rana, M., Cisse, M., van der Maaten, L.
Proceedings of the International Conference on Learning Representations, 2018.
Image Transformations GradM NonDiff ImageNet C
Papernot et al. (2016)Distillation as a defense to adversarial perturbations against deep neural networks
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.
Proceedings of the IEEE Symposium on Security and Privacy, 2016.
Defensive Distillation GradM Van/Exp MNIST, CIFAR C
Song et al. (2018)Pixeldefend: leveraging generative models to understand and defend against adversarial examples
Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.
Proceedings of the International Conference on Learning Representation, 2018.
PixelDefend GradM Van/Exp F-MNIST, CIFAR C
Samangouei et al. (2018)Defense-GAN: protecting classifiers against adversarial attacks using generative models
Samangouei, P., Kabkab, M., Chellappa, R.
Proceedings of the International Conference on Learning Representations, 2018.
Defense-GAN GradM Van/Exp F-MNIST C
Zhou et al. (2020)Manifold projection for adversarial defense on face recognition
Zhou, J., Liang, C., Chen, J.
Proceedings of the European Conference on Computer Vision, 2020.
A-VAE GradM Van/Exp LFW F
Dhillon et al. (2018)Stochastic activation pruning for robust adversarial defense
Dhillon, G.S., Azizzadenesheli, K., Lipton, Z.C., Bernstein, J.D., Kossaifi, J., Khanna, A., Anandkumar, A.
Proceedings of the International Conference on Learning Representations, 2018.
Stochastic pruning GradM Stoch CIFAR C
Xie et al. (2018)Mitigating adversarial effects through randomization
Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.
Proceedings of the International Conference on Learning Representations, 2018.
Randomization GradM Stoch ImageNet C
Gu and Rigazio (2014)Towards deep neural network architectures robust to adversarial examples
Gu, S., Rigazio, L.
arXiv preprint. arXiv:1412.5068, 2014.
Deep Contractive Net. ModelR Reg MNIST C
Cisse et al. (2017)Parseval networks: improving robustness to adversarial examples
Cisse, M., Bojanowski, P., Grave, E., Dauphin, Y., Usunier, N.
Proceedings of International Conference on Machine Learning, 2017.
Parseval Net ModelR Reg CIFAR, SVHN C
Goodfellow et al. (2015)Explaining and harnessing adversarial examples
Goodfellow, I., Shlens, J., Szegedy, C.
Proceedings of the International Conference on Learning Representations, 2015.
AdvTrain ModelR AdvT MNIST C
Madry et al. (2018)Towards deep learning models resistant to adversarial attacks
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.
Proceedings of the International Conference on Learning Representations, 2018.
PGD AdvTrain ModelR AdvT MNIST, CIFAR C
Tramèr et al. (2018)Ensemble adversarial training: attacks and defenses
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.
Proceedings of the International Conference on Learning Representations, 2018.
Ensemble AdvTrain ModelR AdvT ImageNet C
Zhang and Wang (2019)Towards adversarially robust object detection
Zhang, H., Wang, J.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
AROD ModelR AdvT PascalVOC,COCO D
Jia et al. (2020)Robust tracking against adversarial attacks
Jia, S., Ma, C., Song, Y., Yang, X.
Proceedings of the European Conference on Computer Vision, 2020.
RT ModelR AdvT OTB,VOT,UAV T
Raghunathan et al. (2018)Certified defenses against adversarial examples
Raghunathan, A., Steinhardt, J., Liang, P.
Proceedings of the International Conference on Learning Representations., 2018.
Single Semidefinite Relax. ModelR Cert MNIST C
Wong and Kolter (2018)Provable defenses against adversarial examples via the convex outer adversarial polytope
Wong, E., Kolter, Z.
Proceedings of the International Conference on Machine Learning, 2018.
Deep ReLU Net ModelR Cert F-MNIST, HAR, SVHN C
Arnab et al. (2018)On the robustness of semantic segmentation models to adversarial attacks
Arnab, A., Miksik, O., Torr, P.H.S.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
RSSM ModelR Exp PascalVOC, CityScapes S

Submit a new defense


Citation

If you use any content from this page, please cite the following reference:

C. Oh, A. Xompero, and A. Cavallaro, Visual adversarial attacks and defenses, in Advanced Methods And Deep Learning In Computer Vision, Editors: E. R. Davies, Matthew Turk, 1st Edition, Elsevier, pages 511-543, November 2021.



If you have any enquiries, contact us at [cis-web@eecs.qmul.ac.uk].