Skip to main content

Showing 1–1 of 1 results for author: Aradhya, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.02652  [pdf, other

    cs.LG cs.AI cs.CR

    Adaptive Discounting of Training Time Attacks

    Authors: Ridhima Bector, Abhay Aradhya, Chai Quek, Zinovi Rabinovich

    Abstract: Among the most insidious attacks on Reinforcement Learning (RL) solutions are training-time attacks (TTAs) that create loopholes and backdoors in the learned behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are now available, where the attacker forces a specific, target behaviour upon a training RL agent (victim). However, even state-of-the-art C-TTAs focus on target behav… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 19 pages, 7 figures