Date of Award
Department of Computer Science
Adversarial examples consist of minor perturbations added to a model's input which cause the model to output an incorrect prediction. Deep neural networks have been shown to be highly vulnerable to these attacks, and this vulnerability represents both a security risk for the use of deep learning models in security-conscious fields and an opportunity to improve our understanding of how neural networks generalize to unexpected inputs. Transfer attacks are an important subcategory of adversarial attacks. In a transfer attack, the adversary builds an adversarial attack using a surrogate model, then uses that attack to fool an unseen target model. Recent research in this subfield has focused on attack generation methods which can improve transferability between models and ensemble-based attacks. We show that optimizing a single surrogate model is a more effective method of improving adversarial transfer, using the simple example of an undertrained surrogate. This method of attack generation transfers well across varied architectures and outperforms state-of-the-art methods. To interpret the effectiveness of undertrained surrogate models, we represent adversarial transferability as a function of surrogate model loss function curvature and surrogate gradient similarity to target gradient and show that our approach reduces the presence of local loss maxima which hinder transferability. Our results suggest that finding good single surrogate models is a highly effective and simple method for generating transferable adversarial attacks, and that this method represents a valuable route for future study in this field.
Miller, Christopher S., "Query Free Adversarial Transfer via Undertrained Surrogates" (2020). Dartmouth College Undergraduate Theses. 156.