-
Designing Chatbots to Support Victims and Survivors of Domestic Abuse
Authors:
Rahime Belen Saglam,
Jason R. C. Nurse,
Lisa Sugiura
Abstract:
Objective: Domestic abuse cases have risen significantly over the last four years, in part due to the COVID-19 pandemic and the challenges for victims and survivors in accessing support. In this study, we investigate the role that chatbots - Artificial Intelligence (AI) and rule-based - may play in supporting victims/survivors in situations such as these or where direct access to help is limited.…
▽ More
Objective: Domestic abuse cases have risen significantly over the last four years, in part due to the COVID-19 pandemic and the challenges for victims and survivors in accessing support. In this study, we investigate the role that chatbots - Artificial Intelligence (AI) and rule-based - may play in supporting victims/survivors in situations such as these or where direct access to help is limited. Methods: Interviews were conducted with experts working in domestic abuse support services and organizations (e.g., charities, law enforcement) and the content of websites of related support-service providers was collected. Thematic content analysis was then applied to assess and extract insights from the interview data and the content on victim-support websites. We also reviewed pertinent chatbot literature to reflect on studies that may inform design principles and interaction patterns for agents used to support victims/survivors. Results: From our analysis, we outlined a set of design considerations/practices for chatbots that consider potential use cases and target groups, dialog structure, personality traits that might be useful for chatbots to possess, and finally, safety and privacy issues that should be addressed. Of particular note are situations where AI systems (e.g., ChatGPT, CoPilot, Gemini) are not recommended for use, the value of conveying emotional support, the importance of transparency, and the need for a safe and confidential space. Conclusion: It is our hope that these considerations/practices will stimulate debate among chatbots and AI developers and service providers and - for situations where chatbots are deemed appropriate for use - inspire efficient use of chatbots in the support of survivors of domestic abuse.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI
Authors:
Baturay Saglam,
Doga Gurgunoglu,
Suleyman S. Kozat
Abstract:
We introduce a novel deep reinforcement learning (DRL) approach to jointly optimize transmit beamforming and reconfigurable intelligent surface (RIS) phase shifts in a multiuser multiple input single output (MU-MISO) system to maximize the sum downlink rate under the phase-dependent reflection amplitude model. Our approach addresses the challenge of imperfect channel state information (CSI) and ha…
▽ More
We introduce a novel deep reinforcement learning (DRL) approach to jointly optimize transmit beamforming and reconfigurable intelligent surface (RIS) phase shifts in a multiuser multiple input single output (MU-MISO) system to maximize the sum downlink rate under the phase-dependent reflection amplitude model. Our approach addresses the challenge of imperfect channel state information (CSI) and hardware impairments by considering a practical RIS amplitude model. We compare the performance of our approach against a vanilla DRL agent in two scenarios: perfect CSI and phase-dependent RIS amplitudes, and mismatched CSI and ideal RIS reflections. The results demonstrate that the proposed framework significantly outperforms the vanilla DRL agent under mismatch and approaches the golden standard. Our contributions include modifications to the DRL approach to address the joint design of transmit beamforming and phase shifts and the phase-dependent amplitude model. To the best of our knowledge, our method is the first DRL-based approach for the phase-dependent reflection amplitude model in RIS-aided MU-MISO systems. Our findings in this study highlight the potential of our approach as a promising solution to overcome hardware impairments in RIS-aided wireless communication systems.
△ Less
Submitted 29 March, 2023; v1 submitted 10 October, 2022;
originally announced November 2022.
-
Deep Intrinsically Motivated Exploration in Continuous Control
Authors:
Baturay Saglam,
Suleyman S. Kozat
Abstract:
In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. Although the deep setting of undirected exploration has been shown to improve the performance of on-policy methods, they introduce an excessive computational complexity and are known to fail in the off-policy setting. The intrins…
▽ More
In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. Although the deep setting of undirected exploration has been shown to improve the performance of on-policy methods, they introduce an excessive computational complexity and are known to fail in the off-policy setting. The intrinsically motivated exploration is an effective alternative to the undirected strategies, but they are usually studied for discrete action domains. In this paper, we investigate how intrinsic motivation can effectively be combined with deep reinforcement learning in the control of continuous systems to obtain a directed exploratory behavior. We adapt the existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel and scalable directed exploration strategy. The introduced approach, motivated by the maximization of the value function's error, can benefit from a collected set of experiences by extracting useful information and unify the intrinsic exploration motivations in the literature under a single exploration objective. An extensive set of empirical studies demonstrate that our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
Actor Prioritized Experience Replay
Authors:
Baturay Saglam,
Furkan B. Mutlu,
Dogan C. Cicek,
Suleyman S. Kozat
Abstract:
A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical…
▽ More
A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical studies indicate that it considerably underperforms actor-critic algorithms in continuous control. We theoretically show that actor networks cannot be effectively trained with transitions that have large TD errors. As a result, the approximate policy gradient computed under the Q-network diverges from the actual gradient computed under the optimal Q-function. Motivated by this, we introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. The introduced algorithm suggests a new branch of improvements to PER and schedules effective and efficient training for both actor and critic networks. An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches and obtains state-of-the-art results over the standard off-policy actor-critic algorithms.
△ Less
Submitted 1 September, 2022;
originally announced September 2022.
-
Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach
Authors:
Baturay Saglam,
Dogan C. Cicek,
Furkan B. Mutlu,
Suleyman S. Kozat
Abstract:
Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy between the underlying distributions of the agent's policy and collected data increases. Although the well-studied importance sampling and off-policy policy gradient…
▽ More
Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy between the underlying distributions of the agent's policy and collected data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories and induce additional problems such as vanishing/exploding gradients or discarding many useful experiences, which eventually increases the computational complexity. Moreover, their generalization to either continuous action domains or policies approximated by deterministic deep neural networks is strictly limited. To overcome these limitations, we introduce a novel policy similarity measure to mitigate the effects of such discrepancy in continuous control. Our method offers an adequate single-step off-policy correction that is applicable to deterministic policy networks. Theoretical and empirical studies demonstrate that it can achieve a "safe" off-policy learning and substantially improve the state-of-the-art by attaining higher returns in fewer steps than the competing methods through an effective schedule of the learning rate in Q-learning and policy optimization.
△ Less
Submitted 25 September, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms
Authors:
Baturay Saglam,
Dogan C. Cicek,
Furkan B. Mutlu,
Suleyman S. Kozat
Abstract:
Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the e…
▽ More
Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
AWD3: Dynamic Reduction of the Estimation Bias
Authors:
Dogan C. Cicek,
Enes Duran,
Baturay Saglam,
Kagan Kaya,
Furkan B. Mutlu,
Suleyman S. Kozat
Abstract:
Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from thi…
▽ More
Value-based deep Reinforcement Learning (RL) algorithms suffer from the estimation bias primarily caused by function approximation and temporal difference (TD) learning. This problem induces faulty state-action value estimates and therefore harms the performance and robustness of the learning algorithms. Although several techniques were proposed to tackle, learning algorithms still suffer from this bias. Here, we introduce a technique that eliminates the estimation bias in off-policy continuous control algorithms using the experience replay mechanism. We adaptively learn the weighting hyper-parameter beta in the Weighted Twin Delayed Deep Deterministic Policy Gradient algorithm. Our method is named Adaptive-WD3 (AWD3). We show through continuous control environments of OpenAI gym that our algorithm matches or outperforms the state-of-the-art off-policy policy gradient learning algorithms.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay
Authors:
Dogan C. Cicek,
Enes Duran,
Baturay Saglam,
Furkan B. Mutlu,
Suleyman S. Kozat
Abstract:
The experience replay mechanism allows agents to use the experiences multiple times. In prior works, the sampling probability of the transitions was adjusted according to their importance. Reassigning sampling probabilities for every transition in the replay buffer after each iteration is highly inefficient. Therefore, experience replay prioritization algorithms recalculate the significance of a t…
▽ More
The experience replay mechanism allows agents to use the experiences multiple times. In prior works, the sampling probability of the transitions was adjusted according to their importance. Reassigning sampling probabilities for every transition in the replay buffer after each iteration is highly inefficient. Therefore, experience replay prioritization algorithms recalculate the significance of a transition when the corresponding transition is sampled to gain computational efficiency. However, the importance level of the transitions changes dynamically as the policy and the value function of the agent are updated. In addition, experience replay stores the transitions are generated by the previous policies of the agent that may significantly deviate from the most recent policy of the agent. Higher deviation from the most recent policy of the agent leads to more off-policy updates, which is detrimental for the agent. In this paper, we develop a novel algorithm, Batch Prioritizing Experience Replay via KL Divergence (KLPER), which prioritizes batch of transitions rather than directly prioritizing each transition. Moreover, to reduce the off-policyness of the updates, our algorithm selects one batch among a certain number of batches and forces the agent to learn through the batch that is most likely generated by the most recent policy of the agent. We combine our algorithm with Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient and evaluate it on various continuous control tasks. KLPER provides promising improvements for deep deterministic continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.
△ Less
Submitted 12 November, 2021; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Parameter-free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
Authors:
Baturay Saglam,
Furkan Burak Mutlu,
Dogan Can Cicek,
Suleyman Serdar Kozat
Abstract:
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing a…
▽ More
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it considerably outperforms the existing approaches and improves the state-of-the-art by a significant margin.
△ Less
Submitted 19 May, 2022; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods
Authors:
Baturay Saglam,
Enes Duran,
Dogan C. Cicek,
Furkan B. Mutlu,
Suleyman S. Kozat
Abstract:
In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises. To minimize the underestimation, we introduce a p…
▽ More
In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises. To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant. Our Q-value update rule combines the notions behind Clipped Double Q-learning and Maxmin Q-learning by computing the critic objective through the nested combination of maximum and minimum operators to bound the approximate value estimates. We evaluate our modification on the suite of several OpenAI Gym continuous control tasks, improving the state-of-the-art in every environment tested.
△ Less
Submitted 23 September, 2021; v1 submitted 22 September, 2021;
originally announced September 2021.
-
Privacy Concerns in Chatbot Interactions: When to Trust and When to Worry
Authors:
Rahime Belen Saglam,
Jason R. C. Nurse,
Duncan Hodges
Abstract:
Through advances in their conversational abilities, chatbots have started to request and process an increasing variety of sensitive personal information. The accurate disclosure of sensitive information is essential where it is used to provide advice and support to users in the healthcare and finance sectors. In this study, we explore users' concerns regarding factors associated with the use of se…
▽ More
Through advances in their conversational abilities, chatbots have started to request and process an increasing variety of sensitive personal information. The accurate disclosure of sensitive information is essential where it is used to provide advice and support to users in the healthcare and finance sectors. In this study, we explore users' concerns regarding factors associated with the use of sensitive data by chatbot providers. We surveyed a representative sample of 491 British citizens. Our results show that the user concerns focus on deleting personal information and concerns about their data's inappropriate use. We also identified that individuals were concerned about losing control over their data after a conversation with conversational agents. We found no effect from a user's gender or education but did find an effect from the user's age, with those over 45 being more concerned than those under 45. We also considered the factors that engender trust in a chatbot. Our respondents' primary focus was on the chatbot's technical elements, with factors such as the response quality being identified as the most critical factor. We again found no effect from the user's gender or education level; however, when we considered some social factors (e.g. avatars or perceived 'friendliness'), we found those under 45 years old rated these as more important than those over 45. The paper concludes with a discussion of these results within the context of designing inclusive, digital systems that support a wide range of users.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Is your chatbot GDPR compliant? Open issues in agent design
Authors:
Rahime Belen Saglam,
Jason R. C. Nurse
Abstract:
Conversational agents open the world to new opportunities for human interaction and ubiquitous engagement. As their conversational abilities and knowledge has improved, these agents have begun to have access to an increasing variety of personally identifiable information and intimate details on their user base. This access raises crucial questions in light of regulations as robust as the General D…
▽ More
Conversational agents open the world to new opportunities for human interaction and ubiquitous engagement. As their conversational abilities and knowledge has improved, these agents have begun to have access to an increasing variety of personally identifiable information and intimate details on their user base. This access raises crucial questions in light of regulations as robust as the General Data Protection Regulation (GDPR). This paper explores some of these questions, with the aim of defining relevant open issues in conversational agent design. We hope that this work can provoke further research into building agents that are effective at user interaction, but also respectful of regulations and user privacy.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.