Adaptive Discounting of Training Time Attacks

Bector, Ridhima; Aradhya, Abhay; Quek, Chai; Rabinovich, Zinovi

Computer Science > Machine Learning

arXiv:2401.02652 (cs)

[Submitted on 5 Jan 2024]

Title:Adaptive Discounting of Training Time Attacks

Authors:Ridhima Bector, Abhay Aradhya, Chai Quek, Zinovi Rabinovich

View PDF HTML (experimental)

Abstract:Among the most insidious attacks on Reinforcement Learning (RL) solutions are training-time attacks (TTAs) that create loopholes and backdoors in the learned behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are now available, where the attacker forces a specific, target behaviour upon a training RL agent (victim). However, even state-of-the-art C-TTAs focus on target behaviours that could be naturally adopted by the victim if not for a particular feature of the environment dynamics, which C-TTAs exploit. In this work, we show that a C-TTA is possible even when the target behaviour is un-adoptable due to both environment dynamics as well as non-optimality with respect to the victim objective(s). To find efficient attacks in this context, we develop a specialised flavour of the DDPG algorithm, which we term gammaDDPG, that learns this stronger version of C-TTA. gammaDDPG dynamically alters the attack policy planning horizon based on the victim's current behaviour. This improves effort distribution throughout the attack timeline and reduces the effect of uncertainty the attacker has about the victim. To demonstrate the features of our method and better relate the results to prior research, we borrow a 3D grid domain from a state-of-the-art C-TTA for our experiments. Code is available at "this http URL.

Comments:	19 pages, 7 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2401.02652 [cs.LG]
	(or arXiv:2401.02652v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.02652

Submission history

From: Ridhima Bector [view email]
[v1] Fri, 5 Jan 2024 06:03:14 UTC (17,124 KB)

Computer Science > Machine Learning

Title:Adaptive Discounting of Training Time Attacks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Adaptive Discounting of Training Time Attacks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators