Search | arXiv e-print repository

A Systematization of the Wagner Framework: Graph Theory Conjectures and Reinforcement Learning

Authors: Flora Angileri, Giulia Lombardi, Andrea Fois, Renato Faraone, Carlo Metta, Michele Salvi, Luigi Amedeo Bianchi, Marco Fantozzi, Silvia Giulia Galfrè, Daniele Pavesi, Maurizio Parton, Francesco Morandin

Abstract: In 2021, Adam Zsolt Wagner proposed an approach to disprove conjectures in graph theory using Reinforcement Learning (RL). Wagner's idea can be framed as follows: consider a conjecture, such as a certain quantity f(G) < 0 for every graph G; one can then play a single-player graph-building game, where at each turn the player decides whether to add an edge or not. The game ends when all edges have b… ▽ More In 2021, Adam Zsolt Wagner proposed an approach to disprove conjectures in graph theory using Reinforcement Learning (RL). Wagner's idea can be framed as follows: consider a conjecture, such as a certain quantity f(G) < 0 for every graph G; one can then play a single-player graph-building game, where at each turn the player decides whether to add an edge or not. The game ends when all edges have been considered, resulting in a certain graph G_T, and f(G_T) is the final score of the game; RL is then used to maximize this score. This brilliant idea is as simple as innovative, and it lends itself to systematic generalization. Several different single-player graph-building games can be employed, along with various RL algorithms. Moreover, RL maximizes the cumulative reward, allowing for step-by-step rewards instead of a single final score, provided the final cumulative reward represents the quantity of interest f(G_T). In this paper, we discuss these and various other choices that can be significant in Wagner's framework. As a contribution to this systematization, we present four distinct single-player graph-building games. Each game employs both a step-by-step reward system and a single final score. We also propose a principled approach to select the most suitable neural network architecture for any given conjecture, and introduce a new dataset of graphs labeled with their Laplacian spectra. Furthermore, we provide a counterexample for a conjecture regarding the sum of the matching number and the spectral radius, which is simpler than the example provided in Wagner's original paper. The games have been implemented as environments in the Gymnasium framework, and along with the dataset, are available as open-source supplementary materials. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2404.18405 [pdf]

Understanding and Sha** Human-Technology Assemblages in the Age of Generative AI

Authors: Josh Andres, Chris Danta, Andrea Bianchi, Sungyeon Hong, Zhuying Li, Eduardo B. Sandoval, Charles Martin, Ned Cooper

Abstract: Generative AI capabilities are rapidly transforming how we perceive, interact with, and relate to machines. This one-day workshop invites HCI researchers, designers, and practitioners to imaginatively inhabit and explore the possible futures that might emerge from humans combining generative AI capabilities into everyday technologies at massive scale. Workshop participants will craft stories, visu… ▽ More Generative AI capabilities are rapidly transforming how we perceive, interact with, and relate to machines. This one-day workshop invites HCI researchers, designers, and practitioners to imaginatively inhabit and explore the possible futures that might emerge from humans combining generative AI capabilities into everyday technologies at massive scale. Workshop participants will craft stories, visualisations, and prototypes through scenario-based design to investigate these possible futures, resulting in the production of an open-annotated scenario library and a journal or interactions article to disseminate the findings. We aim to gather the DIS community knowledge to explore, understand and shape the relations this new interaction paradigm is forging between humans, their technologies and the environment in safe, sustainable, enriching, and responsible ways. △ Less

Submitted 4 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.06407 [pdf, other]

Rethinking How to Evaluate Language Model Jailbreak

Authors: Hongyu Cai, Arjun Arunasalam, Leo Y. Lin, Antonio Bianchi, Z. Berkay Celik

Abstract: Large language models (LLMs) have become increasingly integrated with various applications. To ensure that LLMs do not generate unsafe responses, they are aligned with safeguards that specify what content is restricted. However, such alignment can be bypassed to produce prohibited content using a technique commonly referred to as jailbreak. Different systems have been proposed to perform the jailb… ▽ More Large language models (LLMs) have become increasingly integrated with various applications. To ensure that LLMs do not generate unsafe responses, they are aligned with safeguards that specify what content is restricted. However, such alignment can be bypassed to produce prohibited content using a technique commonly referred to as jailbreak. Different systems have been proposed to perform the jailbreak automatically. These systems rely on evaluation methods to determine whether a jailbreak attempt is successful. However, our analysis reveals that current jailbreak evaluation methods have two limitations. (1) Their objectives lack clarity and do not align with the goal of identifying unsafe responses. (2) They oversimplify the jailbreak result as a binary outcome, successful or not. In this paper, we propose three metrics, safeguard violation, informativeness, and relative truthfulness, to evaluate language model jailbreak. Additionally, we demonstrate how these metrics correlate with the goal of different malicious actors. To compute these metrics, we introduce a multifaceted approach that extends the natural language generation evaluation method after preprocessing the response. We evaluate our metrics on a benchmark dataset produced from three malicious intent datasets and three jailbreak systems. The benchmark dataset is labeled by three annotators. We compare our multifaceted approach with three existing jailbreak evaluation methods. Experiments demonstrate that our multifaceted evaluation outperforms existing methods, with F1 scores improving on average by 17% compared to existing baselines. Our findings motivate the need to move away from the binary view of the jailbreak problem and incorporate a more comprehensive evaluation to ensure the safety of the language model. △ Less

Submitted 7 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2307.00402 [pdf, other]

Making Sense of Constellations: Methodologies for Understanding Starlink's Scheduling Algorithms

Authors: Hammas Bin Tanveer, Mike Puchol, Rachee Singh, Antonio Bianchi, Rishab Nithyanand

Abstract: Starlink constellations are currently the largest LEO WAN and have seen considerable interest from the research community. In this paper, we use high-frequency and high-fidelity measurements to uncover evidence of hierarchical traffic controllers in Starlink -- a global controller which allocates satellites to terminals and an on-satellite controller that schedules transmission of user flows. We t… ▽ More Starlink constellations are currently the largest LEO WAN and have seen considerable interest from the research community. In this paper, we use high-frequency and high-fidelity measurements to uncover evidence of hierarchical traffic controllers in Starlink -- a global controller which allocates satellites to terminals and an on-satellite controller that schedules transmission of user flows. We then devise a novel approach for identifying how satellites are allocated to user terminals. Using data gathered with this approach, we measure the characteristics of the global controller and identify the factors that influence the allocation of satellites to terminals. Finally, we use this data to build a model which approximates Starlink's global scheduler. Our model is able to predict the characteristics of the satellite allocated to a terminal at a specific location and time with reasonably high accuracy and at a rate significantly higher than baseline. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2304.05767 [pdf, other]

A Decision Tree to Shepherd Scientists through Data Retrievability

Authors: Andrea Bianchi, Giordano d'Aloisio, Francesca Marzi, Antinisca Di Marco

Abstract: Reproducibility is a crucial aspect of scientific research that involves the ability to independently replicate experimental results by analysing the same data or repeating the same experiment. Over the years, many works have been proposed to make the results of the experiments actually reproducible. However, very few address the importance of data reproducibility, defined as the ability of indepe… ▽ More Reproducibility is a crucial aspect of scientific research that involves the ability to independently replicate experimental results by analysing the same data or repeating the same experiment. Over the years, many works have been proposed to make the results of the experiments actually reproducible. However, very few address the importance of data reproducibility, defined as the ability of independent researchers to retain the same dataset used as input for experimentation. Properly addressing the problem of data reproducibility is crucial because often just providing a link to the data is not enough to make the results reproducible. In fact, also proper metadata (e.g., preprocessing instruction) must be provided to make a dataset fully reproducible. In this work, our aim is to fill this gap by proposing a decision tree to sheperd researchers through the reproducibility of their datasets. In particular, this decision tree guides researchers through identifying if the dataset is actually reproducible and if additional metadata (i.e., additional resources needed to reproduce the data) must also be provided. This decision tree will be the foundation of a future application that will automate the data reproduction process by automatically providing the necessary metadata based on the particular context (e.g., data availability, data preprocessing, and so on). It is worth noting that, in this paper, we detail the steps to make a dataset retrievable, while we will detail other crucial aspects for reproducibility (e.g., dataset documentation) in future works. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2302.09116 [pdf, other]

Columbus: Android App Testing Through Systematic Callback Exploration

Authors: Priyanka Bose, Dipanjan Das, Saastha Vasan, Sebastiano Mariani, Ilya Grishchenko, Andrea Continella, Antonio Bianchi, Christopher Kruegel, Giovanni Vigna

Abstract: With the continuous rise in the popularity of Android mobile devices, automated testing of apps has become more important than ever. Android apps are event-driven programs. Unfortunately, generating all possible types of events by interacting with the app's interface is challenging for an automated testing approach. Callback-driven testing eliminates the need for event generation by directly invok… ▽ More With the continuous rise in the popularity of Android mobile devices, automated testing of apps has become more important than ever. Android apps are event-driven programs. Unfortunately, generating all possible types of events by interacting with the app's interface is challenging for an automated testing approach. Callback-driven testing eliminates the need for event generation by directly invoking app callbacks. However, existing callback-driven testing techniques assume prior knowledge of Android callbacks, and they rely on a human expert, who is familiar with the Android API, to write stub code that prepares callback arguments before invocation. Since the Android API is huge and keeps evolving, prior techniques could only support a small fraction of callbacks present in the Android framework. In this work, we introduce Columbus, a callback-driven testing technique that employs two strategies to eliminate the need for human involvement: (i) it automatically identifies callbacks by simultaneously analyzing both the Android framework and the app under test, and (ii) it uses a combination of under-constrained symbolic execution (primitive arguments), and type-guided dynamic heap introspection (object arguments) to generate valid and effective inputs. Lastly, Columbus integrates two novel feedback mechanisms -- data dependency and crash-guidance, during testing to increase the likelihood of triggering crashes, and maximizing coverage. In our evaluation, Columbus outperforms state-of-the-art model-driven, checkpoint-based, and callback-driven testing tools both in terms of crashes and coverage. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Journal ref: International Conference on Software Engineering (ICSE), 2023

arXiv:2112.02095 [pdf, other]

doi 10.1145/3490354.3494445

Intelligent Trading Systems: A Sentiment-Aware Reinforcement Learning Approach

Authors: Francisco Caio Lima Paiva, Leonardo Kanashiro Felizardo, Reinaldo Augusto da Costa Bianchi, Anna Helena Reali Costa

Abstract: The feasibility of making profitable trades on a single asset on stock exchanges based on patterns identification has long attracted researchers. Reinforcement Learning (RL) and Natural Language Processing have gained notoriety in these single-asset trading tasks, but only a few works have explored their combination. Moreover, some issues are still not addressed, such as extracting market sentimen… ▽ More The feasibility of making profitable trades on a single asset on stock exchanges based on patterns identification has long attracted researchers. Reinforcement Learning (RL) and Natural Language Processing have gained notoriety in these single-asset trading tasks, but only a few works have explored their combination. Moreover, some issues are still not addressed, such as extracting market sentiment momentum through the explicit capture of sentiment features that reflect the market condition over time and assessing the consistency and stability of RL results in different situations. Filling this gap, we propose the Sentiment-Aware RL (SentARL) intelligent trading system that improves profit stability by leveraging market mood through an adaptive amount of past sentiment features drawn from textual news. We evaluated SentARL across twenty assets, two transaction costs, and five different periods and initializations to show its consistent effectiveness against baselines. Subsequently, this thorough assessment allowed us to identify the boundary between news coverage and market sentiment regarding the correlation of price-time series above which SentARL's effectiveness is outstanding. △ Less

Submitted 14 November, 2021; originally announced December 2021.

Comments: 9 pages, 5 figures, To appear in the Proceedings of the 2nd ACM International Conference on AI in Finance (ICAIF'21), November 3-5, 2021, Virtual Event, USA

arXiv:2012.07617 [pdf, other]

Specializing Inter-Agent Communication in Heterogeneous Multi-Agent Reinforcement Learning using Agent Class Information

Authors: Douglas De Rizzo Meneghetti, Reinaldo Augusto da Costa Bianchi

Abstract: Inspired by recent advances in agent communication with graph neural networks, this work proposes the representation of multi-agent communication capabilities as a directed labeled heterogeneous agent graph, in which node labels denote agent classes and edge labels, the communication type between two classes of agents. We also introduce a neural network architecture that specializes communication… ▽ More Inspired by recent advances in agent communication with graph neural networks, this work proposes the representation of multi-agent communication capabilities as a directed labeled heterogeneous agent graph, in which node labels denote agent classes and edge labels, the communication type between two classes of agents. We also introduce a neural network architecture that specializes communication in fully cooperative heterogeneous multi-agent tasks by learning individual transformations to the exchanged messages between each pair of agent classes. By also employing encoding and action selection modules with parameter sharing for environments with heterogeneous agents, we demonstrate comparable or superior performance in environments where a larger number of agent classes operates. △ Less

Submitted 10 March, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

Comments: Presented at the AAAI-21 Workshop on Artificial Intelligence in Games

arXiv:2009.13684 [pdf, other]

doi 10.1007/s10846-021-01336-y

Detecting soccer balls with reduced neural networks: a comparison of multiple architectures under constrained hardware scenarios

Authors: Douglas De Rizzo Meneghetti, Thiago Pedro Donadon Homem, Jonas Henrique Renolfi de Oliveira, Isaac Jesus da Silva, Danilo Hernani Perico, Reinaldo Augusto da Costa Bianchi

Abstract: Object detection techniques that achieve state-of-the-art detection accuracy employ convolutional neural networks, implemented to have optimal performance in graphics processing units. Some hardware systems, such as mobile robots, operate under constrained hardware situations, but still benefit from object detection capabilities. Multiple network models have been proposed, achieving comparable acc… ▽ More Object detection techniques that achieve state-of-the-art detection accuracy employ convolutional neural networks, implemented to have optimal performance in graphics processing units. Some hardware systems, such as mobile robots, operate under constrained hardware situations, but still benefit from object detection capabilities. Multiple network models have been proposed, achieving comparable accuracy with reduced architectures and leaner operations. Motivated by the need to create an object detection system for a soccer team of mobile robots, this work provides a comparative study of recent proposals of neural networks targeted towards constrained hardware environments, in the specific task of soccer ball detection. We train multiple open implementations of MobileNetV2 and MobileNetV3 models with different underlying architectures, as well as YOLOv3, TinyYOLOv3, YOLOv4 and TinyYOLOv4 in an annotated image data set captured using a mobile robot. We then report their mean average precision on a test data set and their inference times in videos of different resolutions, under constrained and unconstrained hardware configurations. Results show that MobileNetV3 models have a good trade-off between mAP and inference time in constrained scenarios only, while MobileNetV2 with high width multipliers are appropriate for server-side inference. YOLO models in their official implementations are not suitable for inference in CPUs. △ Less

Submitted 21 February, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: 11-page version of a ~24-page version published in the Journal of Intelligent & Robotics Systems

arXiv:2009.13161 [pdf, other]

doi 10.5753/eniac.2020.12161

Towards Heterogeneous Multi-Agent Reinforcement Learning with Graph Neural Networks

Authors: Douglas De Rizzo Meneghetti, Reinaldo Augusto da Costa Bianchi

Abstract: This work proposes a neural network architecture that learns policies for multiple agent classes in a heterogeneous multi-agent reinforcement setting. The proposed network uses directed labeled graph representations for states, encodes feature vectors of different sizes for different entity classes, uses relational graph convolution layers to model different communication channels between entity t… ▽ More This work proposes a neural network architecture that learns policies for multiple agent classes in a heterogeneous multi-agent reinforcement setting. The proposed network uses directed labeled graph representations for states, encodes feature vectors of different sizes for different entity classes, uses relational graph convolution layers to model different communication channels between entity types and learns distinct policies for different agent classes, sharing parameters wherever possible. Results have shown that specializing the communication channels between entity classes is a promising step to achieve higher performance in environments composed of heterogeneous entities. △ Less

Submitted 20 October, 2020; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: Published and presented at the XVI Encontro Nacional de Inteligência Artificial e Computacional (ENIAC). Fixed notation

arXiv:1904.03949 [pdf, other]

Improving Image Classification Robustness through Selective CNN-Filters Fine-Tuning

Authors: Alessandro Bianchi, Moreno Raimondo Vendra, Pavlos Protopapas, Marco Brambilla

Abstract: Image quality plays a big role in CNN-based image classification performance. Fine-tuning the network with distorted samples may be too costly for large networks. To solve this issue, we propose a transfer learning approach optimized to keep into account that in each layer of a CNN some filters are more susceptible to image distortion than others. Our method identifies the most susceptible filters… ▽ More Image quality plays a big role in CNN-based image classification performance. Fine-tuning the network with distorted samples may be too costly for large networks. To solve this issue, we propose a transfer learning approach optimized to keep into account that in each layer of a CNN some filters are more susceptible to image distortion than others. Our method identifies the most susceptible filters and applies retraining only to the filters that show the highest activation maps distance between clean and distorted images. Filters are ranked using the Borda count election method and then only the most affected filters are fine-tuned. This significantly reduces the number of parameters to retrain. We evaluate this approach on the CIFAR-10 and CIFAR-100 datasets, testing it on two different models and two different types of distortion. Results show that the proposed transfer learning technique recovers most of the lost performance due to input data distortion, at a considerably faster pace with respect to existing methods, thanks to the reduced number of parameters to fine-tune. When few noisy samples are provided for training, our filter-level fine tuning performs particularly well, also outperforming state of the art layer-level transfer learning approaches. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Comments: arXiv admin note: text overlap with arXiv:1705.02406 by other authors

arXiv:1903.03411 [pdf, other]

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

Authors: Thiago Freitas dos Santos, Paulo E. Santos, Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Pedro Cabalar

Abstract: Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to u… ▽ More Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning. △ Less

Submitted 15 February, 2019; originally announced March 2019.

Comments: Submitted to Journal of Heuristics

arXiv:1803.02307 [pdf]

doi 10.1145/2984511.2984550

RealPen: Providing Realism in Handwriting Tasks on Touch Surfaces using Auditory-Tactile Feedback

Authors: Youngjun Cho, Andrea Bianchi, Nicolai Marquardt, Nadia Bianchi-Berthouze

Abstract: We present RealPen, an augmented stylus for capacitive tablet screens that recreates the physical sensation of writing on paper with a pencil, ball-point pen or marker pen. The aim is to create a more engaging experience when writing on touch surfaces, such as screens of tablet computers. This is achieved by re-generating the friction-induced oscillation and sound of a real writing tool in contact… ▽ More We present RealPen, an augmented stylus for capacitive tablet screens that recreates the physical sensation of writing on paper with a pencil, ball-point pen or marker pen. The aim is to create a more engaging experience when writing on touch surfaces, such as screens of tablet computers. This is achieved by re-generating the friction-induced oscillation and sound of a real writing tool in contact with paper. To generate realistic tactile feedback, our algorithm analyses the frequency spectrum of the friction oscillation generated when writing with traditional tools, extracts principal frequencies, and uses the actuator's frequency response profile for an adjustment weighting function. We enhance the realism by providing the sound feedback aligned with the writing pressure and speed. Furthermore, we investigated the effects of superposition and fluctuation of several frequencies on human tactile perception, evaluated the performance of RealPen, and characterized users' perception and preference of each feedback type. △ Less

Submitted 6 March, 2018; originally announced March 2018.

Comments: Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST '16)

arXiv:1707.06560 [pdf]

doi 10.1080/17445760.2017.1288808

An ASM-based Characterization of Starvation-free Systems

Authors: Alessandro Bianchi, Sebastiano Pizzutilo, Gennaro Vessio

Abstract: Abstract State Machines (ASMs) have been successfully applied for modeling critical and complex systems in a wide range of application domains. However, unlike other well-known formalisms, e.g. Petri nets, ASMs lack inherent, domain-independent characterisations of computationally important properties. Here, we provide an ASM-based characterisation of the starvation-free property. The classic, inf… ▽ More Abstract State Machines (ASMs) have been successfully applied for modeling critical and complex systems in a wide range of application domains. However, unlike other well-known formalisms, e.g. Petri nets, ASMs lack inherent, domain-independent characterisations of computationally important properties. Here, we provide an ASM-based characterisation of the starvation-free property. The classic, informal notion of starvation, usually provided in literature, is analysed and expressed as a necessary condition in terms of ASMs. Thus, we enrich the ASM framework with the notion of vulnerable rule as a practical tool for analysing starvation issues in an operational fashion △ Less

Submitted 20 July, 2017; originally announced July 2017.

Comments: This is an Accepted Manuscript of an article published by Taylor & Francis Group in the International Journal of Parallel, Emergent & Distributed Systems on 14/02/2017, available online: http://www.tandfonline.com/10.1080/17445760.2017.1288808

Journal ref: International Journal of Parallel, Emergent & Distributed Systems (2017)

arXiv:1706.01417 [pdf, other]

A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

Authors: Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras

Abstract: Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named {\em Online ASP for MDP} (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with… ▽ More Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named {\em Online ASP for MDP} (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process. △ Less

Submitted 5 June, 2017; originally announced June 2017.

Comments: Submitted to IJCAI 17

arXiv:1705.01399 [pdf, other]

Answer Set Programming for Non-Stationary Markov Decision Processes

Authors: Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Paulo E. Santos, Ramon Lopez de Mantaras

Abstract: Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to fin… ▽ More Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains. △ Less

Submitted 3 May, 2017; originally announced May 2017.

Showing 1–16 of 16 results for author: Bianchi, A